JP2011253126A

JP2011253126A - Voice signal processor and control method thereof

Info

Publication number: JP2011253126A
Application number: JP2010128270A
Authority: JP
Inventors: Masashi Kimura; 正史木村; Fumihiro Kajimura; 文裕梶村
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2010-06-03
Filing date: 2010-06-03
Publication date: 2011-12-15
Anticipated expiration: 2030-06-03
Also published as: JP5473786B2

Abstract

PROBLEM TO BE SOLVED: To reduce the possibility to subtract spectrum excessively when reducing noises contained in a voice in spectrum subtraction.SOLUTION: The voice signal processor having plural drive units comprises: a generating means that generates voice signals from voices around the voice signal processor; and subtraction means that applies spectrum subtraction to frequency spectrum of the voice signals to reduce drive sounds accompanying the operation of the plural drive units and contained in the voice signals. When the plural drive units are in operation simultaneously, the subtraction means processes the frequency components of the voice signals to subtracts a maximum value of each frequency components of plural drive sounds accompanying the operation of the plural drive units.

Description

本発明は、音声信号処理装置、及びその制御方法に関する。 The present invention relates to an audio signal processing device and a control method thereof.

近年、音声信号処理装置として、カメラ等の動画撮影可能な撮影装置が知られている。このような撮影装置を用いて音声付の動画を撮影する場合、撮影装置内部の駆動部の駆動による駆動音（雑音）の影響を抑制することが望まれる。 2. Description of the Related Art In recent years, a photographing apparatus capable of photographing a moving image such as a camera is known as an audio signal processing apparatus. When shooting a moving image with sound using such a photographing device, it is desired to suppress the influence of driving sound (noise) due to driving of a driving unit inside the photographing device.

駆動音の影響を抑制する技術として、特許文献１が知られている。特許文献１には、撮影操作に伴って発生する駆動雑音をスペクトル減算によって除去する方法が開示されている。特許文献１の方法によれば、複数の駆動部が同時に駆動される場合、マイクロフォンから入力された音声のスペクトルから、各駆動部の駆動音を表す予め得られた各スペクトルが減算される。 As a technique for suppressing the influence of driving sound, Patent Document 1 is known. Patent Document 1 discloses a method of removing drive noise generated by a photographing operation by spectral subtraction. According to the method of Patent Document 1, when a plurality of driving units are driven at the same time, each spectrum obtained in advance representing the driving sound of each driving unit is subtracted from the spectrum of the sound input from the microphone.

特開２００６−２７９１８５号公報JP 2006-279185 A

しかしながら、各駆動部の駆動音の位相は必ずしも一致しておらず、各駆動音が相互に干渉して打ち消しあう場合もある。従って、各駆動部の駆動音を表す各スペクトルを単純に減算した場合、マイクロフォンから入力された音声のスペクトルから必要以上に多くのスペクトルを減算してしまい、音声に歪みが生じる可能性がある。 However, the phases of the drive sounds of the drive units do not necessarily match, and the drive sounds may interfere with each other and cancel each other out. Therefore, when each spectrum representing the drive sound of each drive unit is simply subtracted, more spectrum than necessary may be subtracted from the spectrum of the sound input from the microphone, which may cause distortion in the sound.

本発明はこのような状況に鑑みてなされたものであり、音声に含まれる雑音をスペクトル減算により低減する際に、必要以上に多くのスペクトルを減算してしまう可能性を低減することを目的とする。 The present invention has been made in view of such a situation, and an object of the present invention is to reduce the possibility of subtracting more spectrum than necessary when reducing noise contained in speech by spectral subtraction. To do.

上記課題を解決するために、第１の本発明は、複数の駆動部を備える音声信号処理装置であって、前記音声信号処理装置の周囲の音声から音声信号を生成する生成手段と、前記音声信号に含まれる前記複数の駆動部の動作に伴う駆動音を低減するために、前記音声信号の周波数スペクトルに対してスペクトル減算を適用する減算手段と、を備え、前記減算手段は、前記複数の駆動部が同時に動作している場合、周波数毎に、前記音声信号の周波数成分から、前記複数の駆動部それぞれの動作に伴う複数の駆動音それぞれの周波数成分の最大値を減算することを特徴とする音声信号処理装置を提供する。 In order to solve the above-described problem, the first aspect of the present invention is an audio signal processing device including a plurality of driving units, the generating means for generating an audio signal from audio around the audio signal processing device, and the audio Subtracting means for applying spectral subtraction to the frequency spectrum of the audio signal in order to reduce driving sound accompanying the operation of the plurality of driving units included in the signal, the subtracting means comprising the plurality of subtracting means When the drive unit is operating simultaneously, the maximum value of the frequency component of each of the plurality of drive sounds accompanying the operation of each of the plurality of drive units is subtracted from the frequency component of the audio signal for each frequency. An audio signal processing apparatus is provided.

なお、その他の本発明の特徴は、添付図面及び以下の発明を実施するための形態における記載によって更に明らかになるものである。 Other features of the present invention will become more apparent from the accompanying drawings and the following description of the preferred embodiments.

以上の構成により、本発明によれば、音声に含まれる雑音をスペクトル減算により低減する際に、必要以上に多くのスペクトルを減算してしまう可能性を低減することが可能となる。 With the above configuration, according to the present invention, it is possible to reduce the possibility of subtracting more spectrum than necessary when reducing noise contained in speech by spectral subtraction.

（ａ）は本発明の音声信号処理装置の一例である、レンズ２を装着した撮影装置１の斜視図、（ｂ）は撮影装置１及びレンズ２の断面図である。(A) is a perspective view of the imaging device 1 equipped with the lens 2, which is an example of the audio signal processing device of the present invention, and (b) is a cross-sectional view of the imaging device 1 and the lens 2. 撮影装置１及びレンズ２の電気的構成を示すブロック図である。2 is a block diagram illustrating an electrical configuration of the photographing apparatus 1 and the lens 2. FIG. 音声処理回路２６の詳細な構成を示すブロック図である。3 is a block diagram showing a detailed configuration of an audio processing circuit 26. FIG. 雑音処理部４４の詳細な構成を示すブロック図である。3 is a block diagram illustrating a detailed configuration of a noise processing unit 44. FIG. 撮影装置１の動作シーケンス図である。FIG. 3 is an operation sequence diagram of the photographing apparatus 1. 駆動音の、ある周波数成分を抜き出した波形を模式的に示す図であり、（ａ）は２つの駆動音の位相が揃っている場合を示し、（ｂ）は２つの駆動音の位相がずれている場合を示す図である。It is a figure which shows typically the waveform which extracted a certain frequency component of drive sound, (a) shows the case where the phase of two drive sounds is equal, (b) shows the phase shift of two drive sounds. It is a figure which shows the case. 撮影装置１のマイク７が生成する音声信号のスペクトルの模式図である。FIG. 3 is a schematic diagram of a spectrum of an audio signal generated by a microphone 7 of the photographing apparatus 1. 複数の駆動音間のゲイン比と合成波のゲインとの関係を示す図である。It is a figure which shows the relationship between the gain ratio between several drive sounds, and the gain of a synthetic wave. ２つの駆動音の合成スペクトルの模式図であり、（ａ）は単純な加算により合成スペクトルを生成する場合を、（ｂ）は数５に従って合成スペクトルを得る場合を、（ｃ）は２つの駆動音の周波数成分のうちの大きいほうを合成スペクトルとする場合を示す図である。It is a schematic diagram of the synthetic spectrum of two driving sounds, (a) is a case where a synthetic spectrum is generated by simple addition, (b) is a case where a synthetic spectrum is obtained according to Equation 5, and (c) is two driving It is a figure which shows the case where the larger one of the frequency components of a sound is made into a synthetic spectrum.

以下、添付図面を参照して、本発明の実施形態を説明する。なお、本発明の技術的範囲は、特許請求の範囲によって確定されるのであって、以下の個別の実施形態によって限定されるわけではない。また、実施形態の中で説明されている特徴の組み合わせすべてが、本発明に必須とは限らない。 Embodiments of the present invention will be described below with reference to the accompanying drawings. The technical scope of the present invention is determined by the claims, and is not limited by the following individual embodiments. In addition, not all combinations of features described in the embodiments are essential to the present invention.

［第１の実施形態］
図１（ａ）は、本発明の音声信号処理装置の一例である、レンズ２を装着した撮影装置１の斜視図である。撮影装置１はマイク７（図１（ｂ）参照）を備えており、画像の取得と同時に、音声を取得して記録することができる。図１（ａ）において、４はレンズ２の光軸を、３０はレリーズ釦を、３１は操作釦を、３２はマイク７のための開口部を、それぞれ示す。 [First Embodiment]
FIG. 1A is a perspective view of a photographing apparatus 1 equipped with a lens 2, which is an example of an audio signal processing apparatus of the present invention. The photographing apparatus 1 includes a microphone 7 (see FIG. 1B), and can acquire and record sound simultaneously with image acquisition. In FIG. 1A, 4 indicates the optical axis of the lens 2, 30 indicates a release button, 31 indicates an operation button, and 32 indicates an opening for the microphone 7.

図１（ｂ）は、撮影装置１及びレンズ２の断面図である。図１（ｂ）において、図１（ａ）と同一又は同様の構成要素には同一の符号を付し、説明を省略する。図１（ｂ）において、３は撮影光学系を、５はレンズ鏡筒を、６は撮影光学系３からの入射光を光電変換して画像データを生成する撮像手段としての撮像素子を、７はマイクを、８は撮影装置１の背面に設けられた表示装置を、それぞれ示す。また、９は撮影光学系３の調整のための駆動部を、１０は撮影装置１とレンズ２とを接続する接点を、１１は所謂クイックリターンミラー機構を、１２はＡＦセンサを含む焦点検出部を、１４はブレセンサを、それぞれ示す。ブレセンサ１４は、例えば、加速度センサなどからなり、本実施例では、ユーザの手のブレによってカメラが振動するのを検知するため、ブレセンサ１４と呼称する。 FIG. 1B is a cross-sectional view of the photographing device 1 and the lens 2. 1B, the same or similar components as those in FIG. 1A are denoted by the same reference numerals, and description thereof is omitted. In FIG. 1B, 3 is an imaging optical system, 5 is a lens barrel, 6 is an imaging element as imaging means for photoelectrically converting incident light from the imaging optical system 3 to generate image data, and 7. Indicates a microphone, and 8 indicates a display device provided on the back of the photographing apparatus 1. Reference numeral 9 denotes a drive unit for adjusting the photographing optical system 3, reference numeral 10 denotes a contact for connecting the photographing apparatus 1 and the lens 2, reference numeral 11 denotes a so-called quick return mirror mechanism, and reference numeral 12 denotes a focus detection unit including an AF sensor. , 14 indicates a blur sensor. The shake sensor 14 includes, for example, an acceleration sensor. In this embodiment, the shake sensor 14 is referred to as a shake sensor 14 in order to detect that the camera vibrates due to the shake of the user's hand.

撮影装置１には、図１（ａ）に示したように複数のマイクの開口部３２が、図１（ｂ）の断面には投影されない箇所に設けられている。しかしながら、マイク７と開口部３２の存在を明確にするために、図１（ｂ）においてはこれらを模式的に示している。 As shown in FIG. 1A, the photographing apparatus 1 is provided with a plurality of microphone openings 32 at locations that are not projected on the cross section of FIG. However, in order to clarify the presence of the microphone 7 and the opening 32, these are schematically shown in FIG.

ここで、静止画の撮影動作について説明する。撮影装置１は、レンズ２、焦点検出部１２、及び露出検出部（不図示）を用いて、焦点検出及び露出検出を行う。撮影装置１はまた、撮影光学系３の一部を駆動して、撮影光学系３の調整によって像を撮像素子６の近傍に結像させる。撮影装置１は更に、適正な露光になるように絞りを動作させる。詳細な動作は図２のブロック図を参照して後述。撮影装置１は更に、ユーザによる操作釦３１等の操作に従って撮影の各種条件設定を行い、レリーズ釦３０の操作と同期させて撮像素子６より被写体の情報を得て、メモリ２４（図２参照）へ記録を行う。 Here, a still image shooting operation will be described. The photographing apparatus 1 performs focus detection and exposure detection using the lens 2, the focus detection unit 12, and an exposure detection unit (not shown). The photographing apparatus 1 also drives a part of the photographing optical system 3 to form an image near the image sensor 6 by adjusting the photographing optical system 3. Further, the photographing apparatus 1 operates the diaphragm so as to achieve appropriate exposure. Detailed operation will be described later with reference to the block diagram of FIG. The photographing apparatus 1 further sets various shooting conditions according to the operation of the operation button 31 by the user, obtains information on the subject from the image sensor 6 in synchronization with the operation of the release button 30, and the memory 24 (see FIG. 2). To record.

次に動画の撮影動作について説明する。動画の撮影に先立って、ライブビュー釦（不図示）を押すことにより、撮像素子６の画像が表示装置８に表示される。なお、撮像素子６で取得した画像を、表示装置８にリアルタイムで表示することを「ライブビュー」と呼ぶ。撮影装置１は動画撮影釦（不図示）の操作と同期させて、撮像素子６より被写体の情報を設定されたフレームレートで得ると共に、マイク７から音声信号を得て、これらを同期させてメモリ２４へ記録を行う。動画撮影中に撮影光学系３の調整が必要となった場合、撮影装置１は適宜、駆動部９を用いて撮影光学系３の調整を行う。撮影装置１は、動画撮影釦の操作と同期させて撮影を終了する。また、撮影装置１は、動画撮影中においても、レリーズ釦３０を操作することでいつでも静止画撮影が可能である。 Next, a moving image shooting operation will be described. Prior to shooting a moving image, an image of the image sensor 6 is displayed on the display device 8 by pressing a live view button (not shown). In addition, displaying an image acquired by the image sensor 6 on the display device 8 in real time is referred to as “live view”. The photographing apparatus 1 synchronizes with the operation of a moving image photographing button (not shown), obtains subject information from the image sensor 6 at a set frame rate, obtains an audio signal from the microphone 7, and synchronizes them to store the memory. Record to 24. When adjustment of the photographing optical system 3 is necessary during moving image photographing, the photographing apparatus 1 appropriately adjusts the photographing optical system 3 using the drive unit 9. The photographing apparatus 1 ends the photographing in synchronization with the operation of the moving image photographing button. Also, the photographing apparatus 1 can take a still image at any time by operating the release button 30 even during moving image shooting.

図２は、撮影装置１及びレンズ２の電気的構成を示すブロック図である。レンズ２を装着した撮影装置１は、撮像系、画像処理系、音声処理系、記録再生系、及び制御系を有する。撮像系は、撮影光学系３及び撮像素子６を含み、画像処理系は、Ａ／Ｄ変換器２０及び画像処理回路２１を含み、音声処理系は、マイク７及び音声処理回路２６を含み、記録再生系は、記録処理回路２３及びメモリ２４を含む。制御系は、カメラシステム制御回路２５、ＡＦセンサ含む焦点検出部１２、ＡＥセンサ含む露出検出部１３、ブレセンサ１４、操作検出回路２７、レンズシステム制御回路２８、レリーズ釦３０、及び駆動部９を含む。駆動部９は、焦点レンズ駆動部９ａ、ブレ補正駆動部９ｂ、及び絞り駆動部９ｃを含み、それぞれ、レンズ２に含まれる焦点レンズ、ブレ補正レンズ、及び絞りを駆動する。ブレ補正駆動部９ｂは、ブレセンサ１４により検出された撮像装置１の振動によって、光学像がぶれないように、レンズを移動させる駆動を行う。 FIG. 2 is a block diagram showing an electrical configuration of the photographing apparatus 1 and the lens 2. The photographing apparatus 1 equipped with the lens 2 has an imaging system, an image processing system, an audio processing system, a recording / reproducing system, and a control system. The imaging system includes the photographing optical system 3 and the imaging element 6, the image processing system includes the A / D converter 20 and the image processing circuit 21, and the sound processing system includes the microphone 7 and the sound processing circuit 26, and recording. The reproduction system includes a recording processing circuit 23 and a memory 24. The control system includes a camera system control circuit 25, a focus detection unit 12 including an AF sensor, an exposure detection unit 13 including an AE sensor, a blur sensor 14, an operation detection circuit 27, a lens system control circuit 28, a release button 30, and a drive unit 9. . The drive unit 9 includes a focus lens drive unit 9a, a shake correction drive unit 9b, and an aperture drive unit 9c, and drives the focus lens, the shake correction lens, and the stop included in the lens 2, respectively. The blur correction driving unit 9b performs driving to move the lens so that the optical image is not shaken by the vibration of the imaging device 1 detected by the blur sensor 14.

駆動部の例として焦点レンズ駆動部９ａ、ブレ補正駆動部９ｂ、及び絞り駆動部９ｃを挙げたが、本実施形態はこれに限定されない。本発明の音声信号処理装置としての、レンズ２を装着した撮影装置１は、どのような駆動部であれ、複数の駆動部を備えればよい。 As an example of the drive unit, the focus lens drive unit 9a, the shake correction drive unit 9b, and the aperture drive unit 9c are described, but the present embodiment is not limited to this. The photographing apparatus 1 equipped with the lens 2 as the audio signal processing apparatus of the present invention may be provided with a plurality of driving sections, regardless of the driving section.

撮像系は、物体からの光を、撮影光学系３を介して撮像素子６の撮像面に結像する光学処理系である。エイミングなどの撮影予備動作中は、クイックリターンミラー機構１１に設けられたミラーを介して、焦点検出部１２にも光束の一部が導かれる。また、後述するように制御系によって適切に撮影光学系３が調整されることで、適切な光量の物体光を撮像素子６に露光するとともに、撮像素子６の近傍で被写体像が結像する。 The imaging system is an optical processing system that forms an image of light from an object on the imaging surface of the imaging device 6 via the imaging optical system 3. During a preliminary shooting operation such as aiming, a part of the light beam is also guided to the focus detection unit 12 through a mirror provided in the quick return mirror mechanism 11. Further, as will be described later, the photographing optical system 3 is appropriately adjusted by the control system, so that an appropriate amount of object light is exposed to the image sensor 6 and a subject image is formed in the vicinity of the image sensor 6.

画像処理回路２１は、Ａ／Ｄ変換器２０を介して撮像素子６から受けた画像データを処理する信号処理回路であり、ホワイトバランス回路、ガンマ補正回路、補間演算による高解像度化を行う補間演算回路などを有する。 The image processing circuit 21 is a signal processing circuit that processes image data received from the image sensor 6 via the A / D converter 20, and includes a white balance circuit, a gamma correction circuit, and an interpolation calculation for increasing the resolution by interpolation calculation. Circuit and the like.

音声処理系は、マイク７が生成した音声信号に対して音声処理回路２６によって適切な処理を施して録音用音声信号を生成する。録音用音声信号は、動画撮影時においては、後述する記録処理部により画像とリンクして圧縮処理される。 The audio processing system generates an audio signal for recording by performing an appropriate process on the audio signal generated by the microphone 7 by the audio processing circuit 26. The audio signal for recording is compressed and linked with an image by a recording processing unit to be described later at the time of moving image shooting.

記録処理回路２３は、メモリ２４への画像データの出力を行うと共に、表示部２２に出力する像を生成、保存する。また、記録処理回路２３は、予め定められた方法を用いて画像、動画、音声などの圧縮を行う。 The recording processing circuit 23 outputs image data to the memory 24 and generates and stores an image to be output to the display unit 22. In addition, the recording processing circuit 23 compresses images, moving images, sounds, and the like using a predetermined method.

カメラシステム制御回路２５は、撮像の際のタイミング信号などを生成して出力する。焦点検出部１２は、撮影装置１のピント状態を検出する。露出検出部１３は、撮像素子６の信号を処理することで被写体の輝度を検出する。レンズシステム制御回路２８は、カメラシステム制御回路２５の信号に応じて適切にレンズ２を駆動させて撮影光学系３の調整を行う。 The camera system control circuit 25 generates and outputs a timing signal at the time of imaging. The focus detection unit 12 detects the focus state of the photographing apparatus 1. The exposure detection unit 13 detects the luminance of the subject by processing the signal from the image sensor 6. The lens system control circuit 28 adjusts the photographing optical system 3 by appropriately driving the lens 2 in accordance with the signal from the camera system control circuit 25.

また、制御系は、外部操作に応答して撮像系、画像処理系、及び記録再生系をそれぞれ制御する。例えば、静止画撮影においては、レリーズ釦３０の押下を操作検出回路２７が検出して、撮像素子６の駆動、画像処理回路２１の動作、記録処理回路２３の圧縮処理などを制御する。また、表示部２２によって光学ファインダー、液晶モニタ等に情報表示を行う情報表示装置の各セグメントの状態を制御する。 The control system controls the imaging system, the image processing system, and the recording / reproducing system in response to an external operation. For example, in still image shooting, the operation detection circuit 27 detects pressing of the release button 30 to control driving of the image sensor 6, operation of the image processing circuit 21, compression processing of the recording processing circuit 23, and the like. The display unit 22 controls the state of each segment of the information display device that displays information on an optical finder, a liquid crystal monitor, or the like.

制御系による撮影光学系３の調整動作について説明する。カメラシステム制御回路２５には焦点検出部１２及び露出検出部１３が接続されており、静止画撮影においてはこれらの信号を元に適切な焦点位置及び絞り位置を求める。カメラシステム制御回路２５は、電気接点１０を介してレンズシステム制御回路２８に指令を出し、レンズシステム制御回路２８は、焦点レンズ駆動部９ａ及び絞り駆動部９ｃを適切に制御する。一方、動画撮影においては、カメラシステム制御回路２５は、焦点レンズ駆動部９ａにより、焦点レンズを微動させると共に、撮像素子６の信号を解析し、信号のコントラストから焦点位置を求める。更に、カメラシステム制御回路２５は、撮像素子６の信号レベルから絞り位置を求める。 The adjustment operation of the photographing optical system 3 by the control system will be described. A focus detection unit 12 and an exposure detection unit 13 are connected to the camera system control circuit 25. In still image shooting, an appropriate focus position and aperture position are obtained based on these signals. The camera system control circuit 25 issues a command to the lens system control circuit 28 via the electrical contact 10, and the lens system control circuit 28 appropriately controls the focus lens driving unit 9a and the aperture driving unit 9c. On the other hand, in moving image shooting, the camera system control circuit 25 causes the focus lens drive unit 9a to finely move the focus lens and analyze the signal from the image sensor 6 to obtain the focus position from the signal contrast. Further, the camera system control circuit 25 obtains the aperture position from the signal level of the image sensor 6.

また、レンズシステム制御回路２８にはブレセンサ１４が接続されており、静止画撮影において手ぶれ補正を行うモードでは、ブレセンサ１４の信号を元にブレ補正駆動部９ｂを適切に制御する。一方、動画撮影において手ぶれ補正を行うモードでは、静止画撮影と同様にブレ補正駆動部９ｂを駆動することも可能であるし、ブレセンサ１４の信号を元に撮像素子６の読み出し位置を変更する所謂電子防振を行うことも可能である。 Further, the blur sensor 14 is connected to the lens system control circuit 28, and in the mode in which camera shake correction is performed in still image shooting, the blur correction drive unit 9b is appropriately controlled based on the signal of the blur sensor 14. On the other hand, in a mode in which camera shake correction is performed in moving image shooting, it is possible to drive the blur correction drive unit 9b similarly to still image shooting, and so-called changing the readout position of the image sensor 6 based on the signal from the blur sensor 14. Electronic vibration isolation can also be performed.

ここで、動画撮影などの音声記録を伴う撮影について考える。音声記録を伴う撮影においては、撮影装置１及びレンズ２のアクチュエータ駆動に伴う音（以下「メカ駆動音」）は不要な音であり雑音となる。本明細書において「雑音」とは、ホワイトノイズのような背景雑音ではなく、この「メカ駆動音」を指す。 Here, consider shooting with sound recording such as moving image shooting. In photographing with sound recording, a sound (hereinafter referred to as “mechanical driving sound”) accompanying driving of the actuators of the photographing device 1 and the lens 2 is an unnecessary sound and becomes noise. In this specification, “noise” refers to this “mechanical drive sound”, not background noise such as white noise.

図３は、音声処理回路２６の詳細な構成を示すブロック図である。図３において、図２と同一又は同様の構成要素には同一の符号を付し、説明を省略する。図３において、４１はゲイン調整部を、４２はフィルタを、４３はＡ／Ｄ変換器を、４４は雑音処理部を、４５はフィルタを、それぞれ示す。 FIG. 3 is a block diagram showing a detailed configuration of the audio processing circuit 26. 3, the same or similar components as those in FIG. 2 are denoted by the same reference numerals, and description thereof is omitted. In FIG. 3, reference numeral 41 denotes a gain adjustment unit, 42 denotes a filter, 43 denotes an A / D converter, 44 denotes a noise processing unit, and 45 denotes a filter.

マイク７は、撮影装置１の周囲の音声から音声信号を生成し、生成された音声信号はゲイン調整部４１に供給される。ゲイン調整部４１は、Ａ／Ｄ変換器４３のダイナミックレンジが十分に活用できるようにマイク７の信号レベルを調整する。つまり、マイク７の信号レベルが小さいときはゲインアップして信号を増幅し、マイク７の信号レベルが大きいときはゲインを下げて飽和を防ぐ。フィルタ４２はＡ／Ｄ変換器４３のサンプリング周波数を考慮した適切なカットオフ周波数を持つ低域通過フィルタなどで構成される。マイク７が特定の周波数を発する素子の近傍にある場合などは、フィルタ４２は、低域通過フィルタに加えて適当なノッチフィルタを含む場合もある。Ａ／Ｄ変換器４３は、ゲイン調整部４１及びフィルタ４２で処理されたアナログ信号をデジタル信号に変換する。 The microphone 7 generates a sound signal from the sound around the photographing apparatus 1, and the generated sound signal is supplied to the gain adjustment unit 41. The gain adjusting unit 41 adjusts the signal level of the microphone 7 so that the dynamic range of the A / D converter 43 can be fully utilized. That is, when the signal level of the microphone 7 is low, the gain is increased to amplify the signal, and when the signal level of the microphone 7 is high, the gain is decreased to prevent saturation. The filter 42 is configured by a low-pass filter having an appropriate cutoff frequency considering the sampling frequency of the A / D converter 43. When the microphone 7 is in the vicinity of an element that emits a specific frequency, the filter 42 may include an appropriate notch filter in addition to the low-pass filter. The A / D converter 43 converts the analog signal processed by the gain adjusting unit 41 and the filter 42 into a digital signal.

雑音処理部４４は、減算手段として機能し、音声信号に対してスペクトル減算法（ＳＳ法）によるスペクトル減算処理（ＳＳ処理）を適用する。ＳＳは、ＳｐｅｃｔｒａｌＳｕｂｔｒａｃｔｉｏｎの省略形である。 The noise processing unit 44 functions as a subtracting unit, and applies a spectral subtraction process (SS process) by a spectral subtraction method (SS method) to the audio signal. SS is an abbreviation for Spectral Subtraction.

ここで、ＳＳ処理について説明する。予め雑音スペクトル（本明細書では、雑音をフーリエ変換するなどして得たスペクトルを雑音スペクトルと呼ぶ）を用意しておき、取得した音声信号のスペクトルから雑音スペクトルを減算する。本実施形態では、雑音スペクトルは、予め同定して撮影装置１内のメモリ２４などの記憶手段に記憶されているものとする。なお、本明細書において、特に断らない限り、「スペクトル」は「周波数スペクトル」を意味する。 Here, the SS process will be described. A noise spectrum (in this specification, a spectrum obtained by Fourier transforming noise is referred to as a noise spectrum) is prepared in advance, and the noise spectrum is subtracted from the spectrum of the acquired speech signal. In the present embodiment, it is assumed that the noise spectrum is identified in advance and stored in a storage unit such as the memory 24 in the photographing apparatus 1. In this specification, “spectrum” means “frequency spectrum” unless otherwise specified.

ＳＳ法は、雑音が加算的に被写体音に混入していると仮定しており、これを式で表すと The SS method assumes that noise is additively added to the subject sound.

となる。但し、ｘ（ｔ）は取得した音声を、ｓ（ｔ）は被写体音を、ｎ（ｔ）は雑音を、ｔは時間を、それぞれ示す。数１をフーリエ変換すると、 It becomes. Here, x (t) represents the acquired sound, s (t) represents the subject sound, n (t) represents noise, and t represents time. When the number 1 is Fourier transformed,

となる。但し、Ｘ（ω）、Ｓ（ω）、Ｎ（ω）はそれぞれ、ｘ（ｔ）、ｓ（ｔ）、ｎ（ｔ）をフーリエ変換したものであり、ωは周波数である。撮影装置１においては、適当な窓関数を適用して音声信号をフレーム分割して、順次処理を行うが、ここでは説明の簡潔化のために、特定のフレームに着目して説明を行う。数２を見ると明らかなように、Ｓ（ω）を求めるにはＸ（ω）からＮ（ω）を減算すればよい。そこで、次式が与えられる。 It becomes. However, X (ω), S (ω), and N (ω) are the Fourier transforms of x (t), s (t), and n (t), respectively, and ω is the frequency. In the photographing apparatus 1, an audio signal is divided into frames by applying an appropriate window function and sequentially processed. Here, for simplification of description, the description will be given focusing on a specific frame. As is apparent from Equation 2, N (ω) may be subtracted from X (ω) to obtain S (ω). Therefore, the following equation is given.

但し、Ｎ’（ω）はＮ（ω）の推定値、Ｓ’（ω）はＮ’（ω）を用いて求めたＳ（ω）の推定値、βはフロアリング係数である。∠は複素数の偏角を求める演算を示している。数３から分かるように、スペクトルは予め与えられた雑音スペクトルを利用して減算するが、位相についてはＸ（ω）の値をそのまま用いる。また、フロアリング係数βはＳＳ法による音声のひずみを抑制するために導入される係数である（オリジナルのＳＳ法ではβ＝０である）。数３で示したようにＳＳ法においては、雑音は加算的に作用していると仮定している。しかしながら、実際には位相が反転して加算された結果、音声信号の中で複数の雑音が相互に弱めあっている場合も有る。このため、Ｘ（ω）からＮ’（ω）を減算した値が負になることがある。そこで、ＳＳ法ではβよりも小さいときはβとなる様に処理する。 Where N ′ (ω) is an estimated value of N (ω), S ′ (ω) is an estimated value of S (ω) obtained using N ′ (ω), and β is a flooring coefficient. ∠ indicates an operation for obtaining the argument of a complex number. As can be seen from Equation 3, the spectrum is subtracted using a noise spectrum given in advance, but the value of X (ω) is used as it is for the phase. Further, the flooring coefficient β is a coefficient introduced in order to suppress voice distortion by the SS method (β = 0 in the original SS method). As shown in Equation 3, in the SS method, it is assumed that noise acts additively. However, in practice, there are cases where a plurality of noises weaken each other in the audio signal as a result of phase inversion and addition. For this reason, the value obtained by subtracting N ′ (ω) from X (ω) may be negative. Therefore, in the SS method, when it is smaller than β, processing is performed so as to be β.

最後に、Ｓ’（ω）を逆フーリエ変換してｓ’（ｔ）を得てこれをＳＳ処理後の音声とする。 Finally, S ′ (ω) is subjected to inverse Fourier transform to obtain s ′ (t), which is used as the speech after SS processing.

ＳＳ法を適当に実施するためには、事前に推定された雑音スペクトルが、実際に取得された雑音と近いことが重要である。雑音スペクトルが適切ではない場合には、過小減算や過剰減算という結果になる。過小減算の場合、雑音の除去が不十分となる。一方、過剰減算の場合には、スペクトル領域で孤立峰が多く見られ、所謂ミュージカルノイズといわれる耳障りな雑音が発生する。 In order to properly perform the SS method, it is important that the noise spectrum estimated in advance is close to the actually acquired noise. When the noise spectrum is not appropriate, the result is undersubtraction or oversubtraction. In the case of under-subtraction, noise removal is insufficient. On the other hand, in the case of excessive subtraction, many solitary peaks are observed in the spectral region, and annoying noise called so-called musical noise is generated.

以上に説明したＳＳ処理を図４に模式的に示した。図４において、図２と同一又は同様の構成要素には同一の符号を付し、説明を省略する。図４において、ＦＦＴ４４ａは窓関数処理を含めた高速フーリエ変換処理を、ＩＦＦＴ４４ｃは高速逆フーリエ変換処理を、Ｓ’（ω）推定４４ｂは数３の処理を、それぞれ示している。図４からも明らかなように、ＳＳ法は単一チャンネル信号（モノラル音声）にも適用可能な手法である。一方で、事前に何らかの方法でＮ’（ω）を与える必要がある。図４の例では、雑音スペクトル生成部４４ｄが推定雑音スペクトルＮ’（ω）を生成する。 The SS process described above is schematically shown in FIG. In FIG. 4, the same or similar components as those in FIG. In FIG. 4, FFT 44 a indicates fast Fourier transform processing including window function processing, IFFT 44 c indicates fast inverse Fourier transform processing, and S ′ (ω) estimation 44 b indicates processing of Equation 3. As can be seen from FIG. 4, the SS method is applicable to a single channel signal (monaural sound). On the other hand, it is necessary to give N ′ (ω) by some method in advance. In the example of FIG. 4, the noise spectrum generation unit 44d generates an estimated noise spectrum N ′ (ω).

カメラシステム制御回路２５がメモリ２４から事前推定された雑音スペクトルを取得し、雑音スペクトル生成部４４ｄ内の一時記憶部４４ｆに記憶する。雑音スペクトル生成部４４ｄは、カメラシステム制御回路２５の制御信号に従い、適当に雑音スペクトルの選択や合成（スペクトル合成部４４ｅを利用する）を行って、推定雑音スペクトルＮ’（ω）を生成する。以下、推定雑音スペクトルＮ’（ω）を生成の生成処理について、撮影装置１の動作と併せて詳述する。 The camera system control circuit 25 acquires a pre-estimated noise spectrum from the memory 24 and stores it in the temporary storage unit 44f in the noise spectrum generation unit 44d. The noise spectrum generator 44d appropriately selects and synthesizes a noise spectrum (using the spectrum synthesizer 44e) according to a control signal from the camera system control circuit 25, and generates an estimated noise spectrum N ′ (ω). Hereinafter, the generation process of generating the estimated noise spectrum N ′ (ω) will be described in detail together with the operation of the imaging apparatus 1.

カメラシステム制御回路２５は、撮影装置１にレンズ２が装着されたとき、又は電源が投入されたときに、レンズ２内の駆動部９に対応した雑音スペクトルをメモリ２４から取得し雑音スペクトル生成部４４ｄに与える。図２に示した例では、焦点レンズ駆動部９ａ、ブレ補正駆動部９ｂ、及び絞り駆動部９ｃそれぞれの雑音スペクトルが雑音スペクトル生成部４４ｄに供給される。雑音スペクトル生成部４４ｄは、供給された雑音スペクトルを、一時記憶部４４ｆに格納する。 The camera system control circuit 25 acquires a noise spectrum corresponding to the drive unit 9 in the lens 2 from the memory 24 when the lens 2 is attached to the photographing apparatus 1 or when the power is turned on, and a noise spectrum generation unit 44d. In the example shown in FIG. 2, the noise spectra of the focus lens driving unit 9a, the blur correction driving unit 9b, and the aperture driving unit 9c are supplied to the noise spectrum generating unit 44d. The noise spectrum generation unit 44d stores the supplied noise spectrum in the temporary storage unit 44f.

次に、動画撮影前及び撮影中のユーザ操作、及び撮影装置１の内部での処理について、図５を参照して説明する。動画撮影を行う際には、ユーザは、前述したように不図示のライブビュー釦を押下した後に動画撮影釦を操作する。このとき、カメラシステム制御回路２５は、ブレ補正開始を指示するとともに、ピント調整動作を行う。また、動画撮影中にも、ユーザによる適当な操作により、ピント調整動作を再度行うことができる。例えば、レリーズ釦３０を半押し（レリーズ釦３０は２段階のプッシュスイッチであり、その１段階目まで押し込むことを「半押し」と呼ぶ）することで、ピント調整動作が行われる。このときのユーザ操作、ブレ補正駆動、フォーカス駆動の各シーケンスを図５に示した。図５の横軸は時間であり、縦軸は各シーケンスの状態を示している。図５において、ｔ１は、ピント調整動作のためにフォーカス駆動が行われる時間を示している。 Next, user operations before and during moving image shooting and processing inside the shooting device 1 will be described with reference to FIG. When performing moving image shooting, the user operates the moving image shooting button after pressing a live view button (not shown) as described above. At this time, the camera system control circuit 25 instructs start of blur correction and performs a focus adjustment operation. In addition, even during moving image shooting, the focus adjustment operation can be performed again by an appropriate operation by the user. For example, the focus adjustment operation is performed by half-pressing the release button 30 (the release button 30 is a two-stage push switch, and pushing to the first stage is called “half-press”). Each sequence of user operation, blur correction driving, and focus driving at this time is shown in FIG. In FIG. 5, the horizontal axis represents time, and the vertical axis represents the state of each sequence. In FIG. 5, t1 indicates a time during which focus drive is performed for the focus adjustment operation.

動画撮影中に、雑音処理部４４は、各駆動部の駆動状態をカメラシステム制御回路２５から受け取って、図５に示すような雑音処理を行う。即ち、ブレ補正駆動のみが行われているときは、雑音処理部４４は、ブレ補正駆動部９ｂの動作に伴う雑音（駆動音）のみを処理する。一方、ブレ補正駆動とフォーカス駆動とが同時に行われるとき（即ち、複数の駆動部が同時に動作しているとき）には、雑音処理部４４は、ブレ補正駆動部９ｂの動作に伴う雑音及び焦点レンズ駆動部９ａの動作に伴う雑音を併せて処理する。このように、レンズ２を装着した撮影装置１においては、単一の駆動部（雑音源）が動作したり、複数の駆動部（雑音源）が同時に動作したりする。 During moving image shooting, the noise processing unit 44 receives the driving state of each driving unit from the camera system control circuit 25, and performs noise processing as shown in FIG. That is, when only blur correction driving is performed, the noise processing unit 44 processes only noise (driving sound) accompanying the operation of the blur correction driving unit 9b. On the other hand, when the blur correction drive and the focus drive are performed at the same time (that is, when a plurality of drive units are operating simultaneously), the noise processing unit 44 performs noise and focus associated with the operation of the blur correction drive unit 9b. Noise associated with the operation of the lens driving unit 9a is also processed. As described above, in the photographing apparatus 1 equipped with the lens 2, a single drive unit (noise source) operates, or a plurality of drive units (noise source) operate simultaneously.

複数の駆動部が同時に動作している時の駆動音の波形を考える。図６は、駆動音の、ある周波数成分を抜き出した波形を模式的に示す図である。図６において、横軸は時間、縦軸は音圧（音の大きさ）である。図６（ａ）は位相が揃っている場合を示す図であり、図６（ｂ）は適当な量位相がずれている場合を示す図である。図６において、Ｇ１は駆動部１によって発生する音圧の片側振幅を、Ｇ２は駆動部２よって発生する音圧の片側振幅を、Ｇｓは駆動部１と駆動部２の合成波形の片側振幅を、それぞれ示す。図６（ａ）から分かるように、駆動部１と駆動部２の位相が一致した場合には、波は強めあい、Ｇｓ＝Ｇ１＋Ｇ２が成り立つ。このときは、従来技術のスペクトル減算によって駆動音を除去することができる。しかしながら、一般的には各周波数成分において、駆動部１と駆動部２の位相が一致するのは非常に稀な事例となる。多くの場合は、ある周波数成分に着目した場合、駆動部１と駆動部２の位相にはずれがあり、図６（ｂ）の様になる。つまりＧｓ＜Ｇ１＋Ｇ２が一般的な状態である。 Consider the waveform of the drive sound when multiple drive units are operating simultaneously. FIG. 6 is a diagram schematically showing a waveform obtained by extracting a certain frequency component of the drive sound. In FIG. 6, the horizontal axis represents time, and the vertical axis represents sound pressure (sound volume). FIG. 6A is a diagram illustrating a case where the phases are aligned, and FIG. 6B is a diagram illustrating a case where the phase is shifted by an appropriate amount. In FIG. 6, G1 represents the one-sided amplitude of the sound pressure generated by the driving unit 1, G2 represents the one-sided amplitude of the sound pressure generated by the driving unit 2, and Gs represents the one-sided amplitude of the combined waveform of the driving unit 1 and the driving unit 2. , Respectively. As can be seen from FIG. 6A, when the phases of the driving unit 1 and the driving unit 2 coincide with each other, the waves are strengthened and Gs = G1 + G2 is established. At this time, the driving sound can be removed by the conventional spectral subtraction. However, in general, in each frequency component, it is a very rare case that the phases of the drive unit 1 and the drive unit 2 coincide with each other. In many cases, when attention is paid to a certain frequency component, there is a shift in the phases of the drive unit 1 and the drive unit 2 as shown in FIG. That is, Gs <G1 + G2 is a general state.

図７は、撮影装置１のマイク７が生成する音声信号のスペクトルの模式図である。駆動部１を焦点レンズ駆動部９ａとして、駆動部２をブレ補正駆動部９ｂとして、図７を説明する。特定の周波数ｆ１に注目すると、焦点レンズ駆動部９ａの駆動音とブレ補正駆動部９ｂの駆動音との位相が一致している場合、周波数ｆ１における周波数成分Ｓａから（Ｇ１＋Ｇ２）を減算することにより、雑音を除いた成分Ｓ１が得られる。しかしながら、焦点レンズ駆動部９ａの駆動音とブレ補正駆動部９ｂの駆動音との位相がずれている場合、Ｓａに含まれる駆動音の実際の成分Ｇｓ’は、Ｇｓ’＜Ｇ１＋Ｇ２となる。従って、雑音を除いた実際の音声成分をＳ１’とすると、Ｓａから（Ｇ１＋Ｇ２）を減算した場合、（Ｓ１’−Ｓ１）分の過剰減算が発生する。このような過剰減算を防止するには、次式に従って真のＧｓ（図７ではＧｓ’と表記）を算出する必要がある。 FIG. 7 is a schematic diagram of a spectrum of an audio signal generated by the microphone 7 of the photographing apparatus 1. FIG. 7 is described with the driving unit 1 as the focal lens driving unit 9a and the driving unit 2 as the blur correction driving unit 9b. When attention is paid to the specific frequency f1, when the phase of the driving sound of the focus lens driving unit 9a and the driving sound of the blur correction driving unit 9b match, by subtracting (G1 + G2) from the frequency component Sa at the frequency f1. A component S1 excluding noise is obtained. However, when the driving sound of the focus lens driving unit 9a and the driving sound of the shake correction driving unit 9b are out of phase, the actual component Gs ′ of the driving sound included in Sa is Gs ′ <G1 + G2. Accordingly, assuming that the actual speech component excluding noise is S1 ', when (G1 + G2) is subtracted from Sa, excessive subtraction of (S1'-S1) occurs. In order to prevent such excessive subtraction, it is necessary to calculate true Gs (indicated as Gs ′ in FIG. 7) according to the following equation.

ここで、θは駆動部１（焦点レンズ駆動部９ａ）と駆動部２（ブレ補正駆動部９ｂ）の位相差である。数４から明らかなように、θ＝０の時は前述したようにＧｓ＝Ｇ１＋Ｇ２が成り立つ。またθ＝πの時はＧｓ＝｜Ｇ１−Ｇ２｜となる。一般的にはこれらの間になっている。前述したように、レンズ２を装着した撮影装置１の駆動部９が発生する駆動音の情報は事前に知られているので、数４のＧ１及びＧ２は既知である。そのため、Ｇｓを求めるためにはθが分かればよい。 Here, θ is the phase difference between the drive unit 1 (focus lens drive unit 9a) and the drive unit 2 (blur correction drive unit 9b). As is clear from Equation 4, when θ = 0, Gs = G1 + G2 holds as described above. When θ = π, Gs = | G1-G2 |. Generally between these. As described above, since the information on the drive sound generated by the drive unit 9 of the photographing apparatus 1 with the lens 2 attached is known in advance, G1 and G2 in Equation 4 are known. Therefore, in order to obtain Gs, it is sufficient to know θ.

しかしながら、θを取得することは困難なので、数４の期待値を利用する。例えば以下の式を計算すればよい。 However, since it is difficult to obtain θ, the expected value of Equation 4 is used. For example, the following formula may be calculated.

これにより、複数の駆動音の時間領域における合成信号の片側振幅Ｇｓは、複数の駆動音間の位相差に関する期待値として得られる。 Thereby, the one-side amplitude Gs of the synthesized signal in the time domain of the plurality of driving sounds is obtained as an expected value regarding the phase difference between the plurality of driving sounds.

撮影装置１の雑音スペクトル生成部４４ｄは、周波数毎に、数５を解くことに期待値Ｇｓを取得する（他のコンピュータ等で事前に算出してメモリ２４等に格納しておいてもよい）。そして、Ｓ’（ω）推定４４ｂにおいて、周波数毎に、マイク７が生成した音声信号の周波数成分から期待値Ｇｓが減算される。これにより、雑音を低減しつつも必要以上に多くのスペクトルを減算してしまう可能性を低減することができる。 The noise spectrum generation unit 44d of the imaging device 1 acquires the expected value Gs for solving Equation 5 for each frequency (may be calculated in advance by another computer or the like and stored in the memory 24 or the like). . In S ′ (ω) estimation 44b, the expected value Gs is subtracted from the frequency component of the audio signal generated by the microphone 7 for each frequency. As a result, it is possible to reduce the possibility of subtracting more spectra than necessary while reducing noise.

また、期待値Ｇｓを求めるために、ルックアップテーブルを利用することもできる。具体的な方法について図８を用いて説明する。図８は、Ｇ１、Ｇ２のうち大きいほうのゲインを１としたときに、Ｇ１とＧ２の比によってＧｓがどのように変化をするかを示す図である。図８の横軸は、Ｇ１，Ｇ２のうち大きいほうを１としたときの、小さいほうのゲインである。図８の縦軸は、そのときのＧ１に対するＧｓの比（＝Ｇｓ／Ｇ１）である。 In addition, a lookup table can be used to obtain the expected value Gs. A specific method will be described with reference to FIG. FIG. 8 is a diagram showing how Gs changes depending on the ratio of G1 and G2 when the larger gain of G1 and G2 is 1. As shown in FIG. The horizontal axis in FIG. 8 represents the smaller gain when the larger one of G1 and G2 is set to 1. The vertical axis in FIG. 8 is the ratio of Gs to G1 at that time (= Gs / G1).

即ち、図８のルックアップテーブルは、複数の駆動音それぞれの周波数成分間の比率を入力（横軸）とし、最大の周波数成分に対する期待値Ｇｓの倍率を出力（縦軸）とする。出力である倍率を事前に算出することにより、このルックアップテーブルが生成される。 That is, in the look-up table of FIG. 8, the ratio between frequency components of a plurality of driving sounds is input (horizontal axis), and the magnification of the expected value Gs for the maximum frequency component is output (vertical axis). This lookup table is generated by calculating in advance the magnification that is the output.

期待値Ｇｓを求めるためには、例えば、Ｇ１＞Ｇ２の場合にはＧ２／Ｇ１を求める。この時のＧ１に対するＧｓの比を図８のグラフから読み取る。Ｇｓ＝Ｇ１＊（Ｇｓ／Ｇ１）なので、Ｇ１に対して、所定の倍率（図８から読み取った倍率）を乗じることにより、期待値Ｇｓが求められる。このようにすれば、計算量を削減して期待値Ｇｓを求めることができる。 In order to obtain the expected value Gs, for example, G2 / G1 is obtained when G1> G2. The ratio of Gs to G1 at this time is read from the graph of FIG. Since Gs = G1 * (Gs / G1), the expected value Gs can be obtained by multiplying G1 by a predetermined magnification (the magnification read from FIG. 8). In this way, the expected value Gs can be obtained while reducing the amount of calculation.

さらに別の方法としては、ＧｓはＧ１とＧ２の大きいほうのみ（即ち、複数の駆動音それぞれの周波数成分の最大値）を音声信号の周波数成分から減算してもよい。式で表すと As yet another method, Gs may subtract only the larger one of G1 and G2 (that is, the maximum value of the frequency components of each of the plurality of driving sounds) from the frequency component of the audio signal. Expressed as an expression

である。但し、数６でｍａｘ（）は最大値を選択する演算子である。数６は数４においてＧ１＞＞Ｇ２又はＧ１＜＜Ｇ２の場合の近似となっている。駆動部１及び駆動部２から発生する駆動音は一般的には、ホワイトノイズのようなものではなく特定の周波数にピークを持つようなスペクトルを持っている。また、このピークの部分が雑音として大きな割合を占めているのでこの部分を精密に近似したい。駆動部１と駆動部２のピークとそのゲインが一致しない限りピークの部分ではＧ１＞＞Ｇ２又はＧ１＜＜Ｇ２が成り立つ。このため、数６は駆動部１と駆動部２の駆動音の合成波形の良い近似になる。数６の計算は前述のルックアップテーブルを利用する方法の変形とみなすこともできる。即ち、Ｇ１及びＧ２の大きい方に対するＧｓの比を常に１としている場合に等しい。常に１とするために特別な記憶部は必要ない。最大値を利用する場合、ルックアップテーブルを利用する場合に比べて、撮影装置１のメモリ使用量を削減することができ、数５の演算を行う場合に比べて、演算リソースの消費を削減することができる。 It is. However, in Expression 6, max () is an operator for selecting the maximum value. Equation 6 is an approximation in the case of Equation 4 where G1 >> G2 or G1 << G2. The drive sound generated from the drive unit 1 and the drive unit 2 generally has a spectrum that has a peak at a specific frequency, not white noise. Moreover, since this peak portion occupies a large proportion as noise, it is desired to approximate this portion precisely. G1 >> G2 or G1 << G2 holds at the peak unless the peaks of the drive unit 1 and the drive unit 2 coincide with the gain. For this reason, Equation 6 is a good approximation of the combined waveform of the drive sounds of the drive unit 1 and the drive unit 2. The calculation of Equation 6 can be regarded as a modification of the method using the above-described lookup table. That is, it is equal to the case where the ratio of Gs to the larger of G1 and G2 is always 1. In order to always set to 1, no special storage unit is required. When the maximum value is used, the memory usage of the photographing apparatus 1 can be reduced compared to the case where the lookup table is used, and the consumption of calculation resources is reduced compared to the case where the calculation of Formula 5 is performed. be able to.

本発明の効果について、図９を用いて説明する。図９において横軸は周波数（単位はヘルツで対数軸となっている）、縦軸はゲイン（単位はデシベル）である。図９（ａ）はスペクトル領域での単純な加算を利用して雑音を合成した例を、図９（ｂ）は数５によって雑音を合成した例を、図９（ｃ）は数６によって雑音を合成した例を、それそれ示す。詳細なルックアップテーブルを利用する場合も、図９（ｂ）のようになる。また、ルックアップテーブルの記憶容量を減らして粗くした場合には図９（ｂ）と図９（ｃ）の間程度の結果が得られる（数６のように最大値のみを使って処理するのはルックアップテーブルの出力を全て１とした場合に等しいためである）。 The effect of the present invention will be described with reference to FIG. In FIG. 9, the horizontal axis is frequency (unit is hertz and logarithmic axis), and the vertical axis is gain (unit is decibel). 9A shows an example in which noise is synthesized using simple addition in the spectral domain, FIG. 9B shows an example in which noise is synthesized by Equation 5, and FIG. 9C shows noise by Equation 6. An example of synthesizing is shown. The case where a detailed lookup table is used is as shown in FIG. In addition, when the storage capacity of the lookup table is reduced and roughened, a result about between FIG. 9 (b) and FIG. 9 (c) is obtained (the processing is performed using only the maximum value as shown in Equation 6). Is equivalent to the case where all the output of the look-up table is 1.)

前述したように、駆動部１と駆動部２の位相差が不明な場合は、数５で合成雑音を生成するのが適当と思われる。そのため、図９（ｂ）に示した合成された雑音のスペクトルが適当といえる。その結果、適当なＳＳ処理が施され高品位の音声を得ることが出来る。 As described above, when the phase difference between the drive unit 1 and the drive unit 2 is unknown, it seems appropriate to generate the synthesized noise by the equation (5). Therefore, it can be said that the synthesized noise spectrum shown in FIG. As a result, appropriate SS processing is performed and high-quality sound can be obtained.

図９（ａ）に示したスペクトル領域での加算を用いて雑音スペクトルを見積もった例では、合成された雑音のスペクトルが過剰に見積もられている。その結果、ＳＳ処理において被写体音から過剰な減算が行われミュージカルノイズなどが発生する可能性がある。 In the example in which the noise spectrum is estimated using the addition in the spectral region shown in FIG. 9A, the synthesized noise spectrum is excessively estimated. As a result, there is a possibility that musical noise or the like occurs due to excessive subtraction from the subject sound in the SS processing.

一方で、図９（ｃ）に示した最大値を用いて雑音スペクトルを見積もった例では、図９（ｂ）に近い結果が得られており、適当に雑音が処理されることが期待される。特に雑音が大きな音圧を有しているピークの箇所では、ほぼ図９（ｂ）と同じゲインを得ている。その結果、適当なＳＳ処理が施され高品位の音声を得ることが出来る。 On the other hand, in the example in which the noise spectrum is estimated using the maximum value shown in FIG. 9C, a result close to FIG. 9B is obtained, and it is expected that the noise is appropriately processed. . In particular, at the peak portion where the noise has a large sound pressure, the same gain as in FIG. 9B is obtained. As a result, appropriate SS processing is performed and high-quality sound can be obtained.

以上説明したように、本実施形態によれば、撮影装置１は、位相差に関する期待値の演算や最大値の採用により、複数の駆動音の合成スペクトルを推定し、マイク７が生成した音声信号のスペクトルから、推定された合成スペクトルを減算する。これにより、音声に含まれる雑音をスペクトル減算により低減する際に、必要以上に多くのスペクトルを減算してしまう可能性を低減することが可能となる。 As described above, according to the present embodiment, the photographing apparatus 1 estimates the composite spectrum of a plurality of driving sounds by calculating the expected value regarding the phase difference and adopting the maximum value, and the audio signal generated by the microphone 7. The estimated synthesized spectrum is subtracted from the spectrum of. As a result, it is possible to reduce the possibility of subtracting more spectrum than necessary when reducing noise contained in speech by spectral subtraction.

［その他の実施形態］
また、本発明は、以下の処理を実行することによっても実現される。即ち、上述した実施形態の機能を実現するソフトウェア（プログラム）を、ネットワーク又は各種記憶媒体を介してシステム或いは装置に供給し、そのシステム或いは装置のコンピュータ（またはＣＰＵやＭＰＵ等）がプログラムを読み出して実行する処理である。 [Other Embodiments]
The present invention can also be realized by executing the following processing. That is, software (program) that realizes the functions of the above-described embodiments is supplied to a system or apparatus via a network or various storage media, and a computer (or CPU, MPU, or the like) of the system or apparatus reads the program. It is a process to be executed.

Claims

An audio signal processing device including a plurality of driving units,
Generating means for generating an audio signal from audio around the audio signal processing device;
Subtracting means for applying spectral subtraction to the frequency spectrum of the audio signal in order to reduce driving sound accompanying the operation of the plurality of driving units included in the audio signal;
With
When the plurality of driving units are operating at the same time, the subtracting means, for each frequency, from the frequency component of the audio signal, the maximum frequency component of each of the plurality of driving sounds accompanying the operation of the plurality of driving units An audio signal processing device characterized by subtracting a value.

An audio signal processing device including a plurality of driving units,
Generating means for generating an audio signal from audio around the audio signal processing device;
Subtracting means for applying spectral subtraction to the frequency spectrum of the audio signal in order to reduce driving sound accompanying the operation of the plurality of driving units included in the audio signal;
With
The subtracting means corresponds to the frequency components of the plurality of driving sounds associated with the operations of the plurality of driving units from the frequency components of the audio signal for each frequency when the plurality of driving units are operating simultaneously. An audio signal processing apparatus, wherein an expected value related to a phase difference between the plurality of driving sounds is subtracted from one-side amplitude of a synthesized signal obtained by synthesizing signals in a time domain.

The subtracting unit is configured to calculate the expected value by multiplying a maximum value of a frequency component of each of the plurality of driving sounds by a predetermined magnification for each frequency,
The subtracting unit is configured to acquire the predetermined magnification from a look-up table that inputs a ratio of frequency components of each of the plurality of driving sounds and outputs a magnification for each frequency. Item 3. The audio signal processing device according to Item 2.

Storage means for storing a frequency spectrum of each of the plurality of driving sounds;
The audio signal processing apparatus according to any one of claims 1 to 3, wherein the subtracting unit acquires the frequency component of each of the plurality of driving sounds from the storage unit.

Photographic optics,
Imaging means for photoelectrically converting incident light from the imaging optical system to generate image data;
Further comprising
The audio signal processing device according to any one of claims 1 to 4, wherein the plurality of driving units include a driving unit of the photographing optical system.

A photographic optical system including a focus lens, a diaphragm, and an image stabilization lens;
Imaging means for photoelectrically converting incident light from the imaging optical system to generate image data;
Further comprising
5. The drive unit of the focus lens, the drive unit of the diaphragm, and the drive unit of the blur correction lens, wherein the plurality of drive units includes the drive unit of the blur correction lens. Audio signal processing device.

A method of controlling an audio signal processing device including a plurality of driving units,
A generating step for generating a sound signal from the sound around the sound signal processing device;
A subtracting step in which a subtracting unit applies a spectral subtraction to a frequency spectrum of the audio signal in order to reduce a driving sound accompanying an operation of the plurality of driving units included in the audio signal;
With
In the subtracting step, when the plurality of driving units are operating at the same time, the subtracting unit, for each frequency, from the frequency component of the audio signal, a plurality of driving sounds accompanying each operation of the plurality of driving units A control method characterized by subtracting the maximum value of frequency components.

A method of controlling an audio signal processing device including a plurality of driving units,
A generating step for generating a sound signal from the sound around the sound signal processing device;
A subtracting step in which a subtracting unit applies a spectral subtraction to a frequency spectrum of the audio signal in order to reduce a driving sound accompanying an operation of the plurality of driving units included in the audio signal;
With
In the subtracting step, when the plurality of driving units are operating at the same time, the subtracting unit, for each frequency, from the frequency component of the audio signal, a plurality of driving sounds accompanying each operation of the plurality of driving units A control method comprising: subtracting an expected value related to a phase difference between the plurality of driving sounds from one-side amplitude of a synthesized signal obtained by synthesizing signals in a time domain corresponding to a frequency component.