TW201933859A

TW201933859A - Electronic device having audio-visual processing function and method thereof

Info

Publication number: TW201933859A
Application number: TW107102519A
Authority: TW
Inventors: 杜博仁; 吳星助; 胡展源; 張嘉仁; 曾凱盟
Original assignee: 宏碁股份有限公司
Priority date: 2018-01-24
Filing date: 2018-01-24
Publication date: 2019-08-16

Abstract

An electronic device having audio-visual processing function is provided. The electronic device comprises: a controller for receiving a plurality of images of each direction and a plurality of input audio signals of each direction respectively captured by a plurality of cameras and a plurality of omnidirectional microphones, stitching the plurality of images to generate an omnidirectional image, setting at least one calibration reference point of the omnidirectional image, setting an audio-visual common origin of the omnidirectional image and the input audio signals, calculating directional patterns of left and right channels of the audio-visual common origin, respectively incorporating the input audio signals to generate output audio signals of left and right channels; a display for displaying at least a portion of the omnidirectional image; a speaker for outputting the output audio signals of left and right channels; and wherein when a user rotates the omnidirectional image displayed by the display, the controller calculates an offset angle of the rotation relative to the audio-visual common origin by using the at least one calibration reference point, and adjusting the directional patterns according to the offset angle, and adjusting an non-linear weighting ratio of the input audio signals to adjust the output audio signals to conform to the rotation of the omnidirectional image.

Description

Electronic device with audio and video processing function and method thereof

本發明是有關於電子裝置，特別是有關於具有影音處理功能的電子裝置及其方法。 The present invention relates to electronic devices, and more particularly to electronic devices having audio-visual processing functions and methods therefor.

現行的全向(omnidirectional)攝影裝置，主要針對影像部分做縫合的處理，達到全向影像的效果，當使用者在轉動全向影像時，如同使用者處在此環境中轉動頭部觀看影像，然而，影像中的聲音卻沒有跟著全向影像一起轉動變化，造成影音的不同步。因此，需要一種具有影音處理功能的電子裝置及影音處理方法，能夠將聲音的變化方式考慮進來，跟著影像的轉動而做同步的動態處理，藉此提供使用者身歷其境的影音效果。 The current omnidirectional photographic device mainly performs stitching processing on the image portion to achieve the effect of omnidirectional image. When the user rotates the omnidirectional image, the user rotates the head to view the image in the environment. However, the sound in the image does not rotate along with the omnidirectional image, causing the audio and video to be out of sync. Therefore, there is a need for an electronic device having a video processing function and a video and audio processing method, which can take into account the manner in which the sound is changed, and perform dynamic processing in synchronization with the rotation of the image, thereby providing the user with an immersive audio and video effect.

本發明提供一種具有影音處理功能的電子裝置及影音處理方法，可將全向影像及聲音訊號同步轉動。 The invention provides an electronic device with audio and video processing function and a video and audio processing method, which can synchronously rotate an omnidirectional image and an audio signal.

本發明之一實施例提供一種具有影音處理功能的電子裝置，包括：一控制器，接收由複數攝影機與複數全向麥克風所分別擷取的各方向的複數影像及各方向的複數輸入音訊，縫合該等複數影像以產生一全向影像，設定該全向影像的至少一校正基準點，設定該全向影像與該等輸入音訊的一影音共同原點，計算該影音共同原點的左、右聲道之有向型樣，分別納入該等輸入音訊，產生左、右聲道的輸出音訊；一顯示器，用以顯示該全向影像的至少一部分；一揚聲器，用以輸出該左、右聲道的輸出音訊；以及其中，當一使用者轉動該顯示器所顯示的該全向影像時，該控制器藉由該至少一校正基準點，計算前述轉動相對於該影音共同原點的一偏移角度，根據該偏移角度調整該等有向型樣，調整該等輸入音訊的非線性權重比例，用以調整該等輸出音訊，使其符合該全向影像的轉動。 An embodiment of the present invention provides an electronic device having a video processing function, including: a controller that receives a plurality of images in each direction captured by a plurality of cameras and a plurality of omnidirectional microphones, and a plurality of input audio signals in each direction, and stitches The plurality of images to generate an omnidirectional image, the omnidirectional image is set At least one calibration reference point, setting the omnidirectional image and a common origin of the audio and video of the input audio, calculating a directional pattern of the left and right channels of the common origin of the video, respectively, and incorporating the input audio to generate a left And an output audio of the right channel; a display for displaying at least a portion of the omnidirectional image; a speaker for outputting the output audio of the left and right channels; and wherein, when a user rotates the display The omnidirectional image, the controller calculates an offset angle of the rotation relative to the common origin of the video by the at least one calibration reference point, and adjusts the directional patterns according to the offset angle, and adjusts the directional image The non-linear weight ratio of the input audio is used to adjust the output audio to conform to the rotation of the omnidirectional image.

本發明之另一實施例提供一種影音處理方法，用於包括一控制器、一顯示器及一揚聲器之電子裝置，該方法包括：藉由該控制器，接收由複數攝影機與複數全向麥克風所分別擷取的各方向的複數影像及各方向的複數輸入音訊，縫合該等複數影像以產生一全向影像，設定該全向影像的至少一校正基準點，設定該全向影像與該等輸入音訊的一影音共同原點，計算該影音共同原點的左、右聲道之有向型樣，分別納入該等輸入音訊，產生左、右聲道的輸出音訊；藉由該顯示器，用以顯示該全向影像的至少一部分；藉由該揚聲器，用以輸出該左、右聲道的輸出音訊；以及，當一使用者轉動該顯示器所顯示的該全向影像時，該控制器藉由該至少一校正基準點，計算前述轉動相對於該影音共同原點的一偏移角度，根據該偏移角度調整該等有向型樣，調整該等輸入音訊的非線性權重比例，用以調整該等輸出音訊，使其符合該全向影像的轉動。 Another embodiment of the present invention provides a video processing method for an electronic device including a controller, a display, and a speaker. The method includes: receiving, by the controller, a plurality of cameras and a plurality of omnidirectional microphones respectively Capturing the plurality of images in each direction and the plurality of input audio in each direction, stitching the plurality of images to generate an omnidirectional image, setting at least one calibration reference point of the omnidirectional image, setting the omnidirectional image and the input audio a common origin of a video and audio, calculating a directional pattern of the left and right channels of the common origin of the video, respectively, incorporating the input audio to generate output audio of the left and right channels; and the display is used for displaying At least a portion of the omnidirectional image; the speaker outputs an output audio of the left and right channels; and when a user rotates the omnidirectional image displayed by the display, the controller At least one calibration reference point, calculating an offset angle of the rotation relative to a common origin of the video, adjusting the directional patterns according to the offset angle, and adjusting the inputs The weight ratio of the nonlinear weight information, to adjust the audio output such that it rotates in line with the omnidirectional image.

100‧‧‧電子裝置 100‧‧‧Electronic devices

101‧‧‧攝影機 101‧‧‧ camera

102‧‧‧全向麥克風 102‧‧‧ Omnidirectional microphone

103‧‧‧控制器 103‧‧‧ Controller

104‧‧‧顯示器 104‧‧‧ display

105‧‧‧揚聲器 105‧‧‧Speakers

110‧‧‧影音擷取裝置 110‧‧‧Video capture device

120‧‧‧主機 120‧‧‧Host

401、402、403、404、405、406‧‧‧步驟 401, 402, 403, 404, 405, 406‧ ‧ steps

A、B、C‧‧‧校正基準點 A, B, C‧‧‧ calibration reference points

D_L‧‧‧左聲道的有向型樣 D _L ‧‧‧ directional pattern of the left channel

D_R‧‧‧右聲道的有向型樣 D _R ‧‧‧ directional pattern of the right channel

X₁、X₂、X₃、X₄‧‧‧輸入音訊 X ₁ , X ₂ , X ₃ , X ₄ ‧‧‧ input audio

第1A圖係根據本發明第一實施例之具有影音處理功能的電子裝置100之示意圖。 1A is a schematic diagram of an electronic device 100 having a video processing function according to a first embodiment of the present invention.

第1B圖係根據本發明第二實施例之具有影音處理功能的電子裝置100之示意圖。 1B is a schematic diagram of an electronic device 100 having a video processing function according to a second embodiment of the present invention.

第2A圖係顯示一全向影像之一部分的示意圖。 Figure 2A is a schematic diagram showing a portion of an omnidirectional image.

第2B圖係顯示當一使用者轉動全向影像之一部分時，校正基準點會跟著移動的示意圖。 Figure 2B shows a schematic diagram of the correction of the reference point as it moves a portion of the omnidirectional image.

第3A圖係根據本發明第一實施例之電子裝置100，具有四個全向麥克風，分別擷取了輸入音訊X₁、X₂、X₃、X₄之示意圖。 3A is an electronic device 100 according to the first embodiment of the present invention, having four omnidirectional microphones, respectively capturing schematic diagrams of input audio X ₁ , X ₂ , X ₃ , and X ₄ .

第3B圖係顯示使用者轉動全向影像後，調整左、右聲道的有向型樣之示意圖。 Figure 3B shows a schematic diagram of adjusting the directional pattern of the left and right channels after the user turns the omnidirectional image.

第4圖係根據本發明第一實施例或第二實施例之影音處理方法流程圖。 Fig. 4 is a flow chart showing a method of processing audio and video according to the first embodiment or the second embodiment of the present invention.

為使本發明之上述及其他目的、特徵及優點能更明顯易懂，下文特舉出較佳實施例，並配合所附圖式，作詳細說明如下。 The above and other objects, features and advantages of the present invention will become more <RTIgt;

第1A圖係根據本發明第一實施例之具有影音處理功能的電子裝置100之示意圖。電子裝置100包括一影音擷取裝置110及一主機120，用以提供使用者觀賞全向攝影的影像，並將聲音跟著影像的轉動而做同步的動態處理。影音擷取裝置110可以包括攝影機101、全向麥克風102，用以擷取來自各方向的複數影像以及各方向的複數輸入音訊。主機120可以是包括智慧型手機、筆記型電腦、平板電腦等行動裝置，也可以是個人電腦(PC)、視訊會議系統等電子裝置，但本發明不限於此。 1A is a schematic diagram of an electronic device 100 having a video processing function according to a first embodiment of the present invention. The electronic device 100 includes a video capture device 110 and a host 120 for providing a user with a view of the omnidirectional video and synchronizing the sound with the rotation of the image. The video capture device 110 may include a camera 101 and an omnidirectional microphone 102 for capturing a plurality of images from various directions and complex input audio in each direction. Host 120 can be a package The mobile device such as a smart phone, a notebook computer, or a tablet computer may be an electronic device such as a personal computer (PC) or a video conferencing system, but the present invention is not limited thereto.

在第一實施例中，影音擷取裝置110可以透過立架、自拍架(圖未繪示)等外接的方式與主機120連接，並透過USB、高畫質多媒體介面(High Definition Multimedia Interface，HDMI)、DisplayPort(DP)等影音傳輸介面，或是透過DVI(Digital Visual Interface)端子、VGA(Video Graphics Array)端子，搭配TRS(Tip Ring Sleeve，TRS)端子、RCA(Radio Corporation of America，RCA)端子等音源線的方式，或是透過紅外線、藍芽、Wi-Fi等無線方式與主機110連接，用以傳輸所擷取到的複數影像及複數輸入音訊至主機120。在其他實施例中，攝影機101及全向麥克風102可以是獨立的裝置，透過上述有線或無線的方式各自獨立地與主機120連接。 In the first embodiment, the video capture device 110 can be connected to the host 120 through an external connection such as a stand, a self-timer (not shown), and through a USB, High Definition Multimedia Interface (HDMI). ), DisplayPort (DP) and other audio and video transmission interface, or through DVI (Digital Visual Interface) terminal, VGA (Video Graphics Array) terminal, with TRS (Tip Ring Sleeve, TRS) terminal, RCA (Radio Corporation of America, RCA) The audio source line such as a terminal is connected to the host 110 via a wireless method such as infrared, Bluetooth, Wi-Fi, etc., for transmitting the captured plurality of images and the plurality of input audio to the host 120. In other embodiments, the camera 101 and the omnidirectional microphone 102 may be separate devices that are independently connected to the host 120 via the wired or wireless means.

第1B圖係根據本發明第二實施例之具有影音處理功能的電子裝置100之示意圖。在此實施例中，和第一實施例中相同名稱的元件，功能亦相同。第1B圖和第1A圖的主要差異在於電子裝置100以內建(built-in)的方式設置攝影機101及全向麥克風102。攝影機101及全向麥克風102可分別設置在電子裝置100的上層位置，然本發明不限於此。 1B is a schematic diagram of an electronic device 100 having a video processing function according to a second embodiment of the present invention. In this embodiment, the same names as those in the first embodiment have the same functions. The main difference between FIG. 1B and FIG. 1A is that the electronic device 100 is provided with the camera 101 and the omnidirectional microphone 102 in a built-in manner. The camera 101 and the omnidirectional microphone 102 may be respectively disposed at an upper position of the electronic device 100, but the present invention is not limited thereto.

在第1A、1B圖中，攝影機101可以具有複數相同的攝影機，例如三個攝影機，分別向各方向攝影，以擷取各方向的複數影像。全向麥克風102可以具有N個全向麥克風，N是正整數，全向麥克風102是全指向性式(Omnidirectional)，用以接收來自不同角度的輸入音訊X₁、X₂、...、X_N，在此實施利中，全向麥克風102可以是四個相同的全向麥克風，分別擷取各方向的複數輸入音訊。在本實施例中，當攝影機101擷取複數影像時，電子裝置100係處在固定不動的狀態。 In FIGS. 1A and 1B, the camera 101 may have a plurality of cameras, for example, three cameras, which are photographed in each direction to capture a plurality of images in each direction. The omnidirectional microphone 102 can have N omnidirectional microphones, N is a positive integer, and the omnidirectional microphone 102 is Omnidirectional for receiving input audio X ₁ , X ₂ , ..., X _N from different angles. In this implementation, the omnidirectional microphone 102 can be four identical omnidirectional microphones that respectively capture complex input audio in each direction. In the present embodiment, when the camera 101 captures a plurality of images, the electronic device 100 is in a stationary state.

在一實施例中，主機120包括控制器103、顯示器104以及揚聲器105。控制器103可以是微處理器、數位訊號處理器(Digital Signal Processor，DSP)、特殊應用積體電路(Application-Specific Integrated Circuit，ASIC)等控制器，用以接收由複數攝影機101與複數全向麥克風102所分別擷取的各方向的複數影像及各方向的複數輸入音訊，接著，縫合該等複數影像以產生一全向影像，該全向影像係一圓形或圓柱的360度影像。在其他實施例中，控制器103可接收由單一攝影機所擷取的全向影像，而不需經過縫合程序，本發明不限於此。控制器103從全向影像中，設定至少一校正基準點，校正基準點可以是至少一個固定不動或相對不易移動的物體，例如在全向影像中的桌子、盆栽、白板、燈飾等等物體，用以在使用者轉動全向影像時，提供校正基準點得以計算偏移角度。此外，為了避免人物在全向影像錄影時，因為走動而擋住校正基準點，控制器103可以設定複數校正基準點，以避免發生上述情況。 In an embodiment, host 120 includes controller 103, display 104, and speaker 105. The controller 103 can be a microprocessor, a digital signal processor (DSP), an application-specific integrated circuit (ASIC), etc., for receiving by the plurality of cameras 101 and the complex omnidirectional The plurality of images in each direction and the complex input audio in each direction are respectively captured by the microphone 102, and then the plurality of images are stitched to generate an omnidirectional image, which is a circular or cylindrical 360-degree image. In other embodiments, the controller 103 can receive an omnidirectional image captured by a single camera without going through a stitching procedure, and the invention is not limited thereto. The controller 103 sets at least one calibration reference point from the omnidirectional image, and the calibration reference point may be at least one fixed or relatively non-movable object, such as a table, a pot, a whiteboard, a lighting, and the like in the omnidirectional image. It is used to calculate the offset angle when the user turns the omnidirectional image by providing a correction reference point. In addition, in order to prevent the person from blocking the correction reference point while walking in the omnidirectional video recording, the controller 103 can set the complex correction reference point to avoid the above situation.

在一實施例中，第2A圖係顯示一全向影像之一部分的示意圖，控制器103設定該全向影像中的圓柱A、立方體B以及四面體C為校正基準點。接著，控制器103設定該全向影像與該等輸入音訊的一影音共同原點。在一實施例中，在電子裝置100開始擷取影像時，控制器103將顯示器104所顯示畫面的中心位置設定為影音共同原點，但也可由使用者任意設定，本發明不限於此。控制器103將該全向影像及來自各方向的該等輸入音訊設定影音共同原點，使得全向影像及該等輸入音訊在一虛擬座標上有一共同的原點，以虛擬球座標為例，即徑向距離(radial distance，r)、天頂角(zenith angle，θ)及方位角(azimuth angle，φ)皆為0的共同原點(r=0，θ=0，φ=0)，用以校正該全向影像及該等輸入音訊的方位角度。 In an embodiment, FIG. 2A is a schematic diagram showing a portion of an omnidirectional image, and the controller 103 sets the cylinder A, the cube B, and the tetrahedron C in the omnidirectional image as correction reference points. Then, the controller 103 sets the omnidirectional image and a common origin of the audio and video of the input audio. In an embodiment, when the electronic device 100 starts capturing images, the controller 103 sets the center position of the screen displayed on the display 104 to the common origin of the video and audio, but may be arbitrarily set by the user, and the present invention is not limited thereto. The controller 103 sets the omnidirectional image and the input audio from each direction to set a common origin of the audio and video, so that the omnidirectional image and the input audio have a common origin on a virtual coordinate, taking the virtual ball coordinate as an example. i.e., the radial distance (radial distance, r), the zenith angle (zenith angle, θ) and the azimuth angle (azimuth angle, φ) are all common origin 0 (r = 0, θ = 0 , φ = 0), with To correct the omnidirectional image and the azimuth angle of the input audio.

控制器103還計算該影音共同原點的左、右聲道的有向型樣(directional pattern)，前述左、右聲道係指一使用者在面對電子裝置100的影音共同原點時，對應於該使用者的左、右方向的雙聲道，使其產生立體聲音效果。其中該等有向型樣係一聲音強度函數D(θ,φ,f)，具有天頂角、方位角及複數頻率等參數。其中，複數頻率可以是人耳的可聽到的聲音頻率範圍，例如：20~20000Hz，但本發明不限於此。在計算該影音共同原點的該等有向型樣時，控制器103使用包括左、右耳的時間差(Interaural time differences)、音壓差(Interaural level differences)及頻率響應(Spectral Filter)的預設聲音模型。控制器103模擬各方向且各種頻率的訊號，對左、右耳的時間差、音壓差，以及雙耳所造成的頻率響應，以此為基準計算在影音共同原點的左、右聲道各個頻率的有向型樣，並將此視為基於該影音共同原點的有向型樣，即左聲道的有向型樣D _L(0,0,f)及右聲道的有向型樣D _R(0,0,f)。接著，參考第3A圖，第3A圖係根據本發明第一實施例之電子裝置100，具有四個全向麥克風，分別擷取了輸入音訊X₁、X₂、X₃、X₄之示意圖。控制器103將左、右聲道的有向型樣分別納入該等輸入音訊，以產生在影音共同原點的左、右聲道的輸出音訊。 The controller 103 also calculates a directional pattern of the left and right channels of the common origin of the video, and the left and right channels refer to a common origin of the user facing the audio and video of the electronic device 100. Corresponding to the two channels of the user's left and right directions, it produces a stereo sound effect. The directional patterns are a sound intensity function D(θ, φ, f) having parameters such as zenith angle, azimuth angle and complex frequency. The complex frequency may be an audible frequency range of the human ear, for example, 20 to 20000 Hz, but the invention is not limited thereto. In calculating the directional patterns of the common origin of the video, the controller 103 uses pre-interval time differences, interaural level differences, and frequency response (Spectral Filter) including the left and right ears. Set the sound model. The controller 103 simulates signals in various directions and various frequencies, the time difference between the left and right ears, the sound pressure difference, and the frequency response caused by the binaural ears, and calculates the left and right channels of the common origin of the video and audio as a reference. The directional pattern of frequency, and this is regarded as the directional pattern based on the common origin of the video, that is, the directed pattern of the left channel D _L (0,0,f) and the directional type of the right channel Sample D _R (0,0,f). Next, referring to FIG. 3A, FIG. 3A is an electronic device 100 according to the first embodiment of the present invention, having four omnidirectional microphones, respectively capturing schematic diagrams of input audio X ₁ , X ₂ , X ₃ , and X ₄ . The controller 103 incorporates the directional patterns of the left and right channels into the input audio to generate the output audio of the left and right channels at the common origin of the video and audio.

詳細而言，如第3A圖所示，基於該影音共同原點的有向型樣D _L(0,0,f)及D _R(0,0,f)，分別將各方向的輸入音訊X₁、X₂、X₃、X₄施加不同的權重後，再合成產生在影音共同原點的左、右聲道的輸出音訊波束，使得使用者可聆聽在影音共同原點來自各方向的聲音訊號。 Specifically, as shown in FIG. 3A, based on the video to have a common origin pattern D _L (0,0, f) and D _R (0,0, f), respectively, the audio inputs to the X direction _1. After applying different weights to X ₂ , X ₃ , and X ₄ , the output audio beams of the left and right channels at the common origin of the video and audio are synthesized, so that the user can listen to the sound from all directions in the common origin of the video and audio. Signal.

顯示器104，可以是液晶顯示器、有機發光二極體顯示器等，用以顯示該全向影像的至少一部分，顯示器104可具有觸控功能，方便使用者透過觸控顯示器直接操作該電子裝置100，例如放大、縮小甚至是轉動該全向影像。 The display device 104 can be a liquid crystal display, an organic light emitting diode display, or the like for displaying at least a part of the omnidirectional image. The display 104 can have a touch function, so that the user can directly operate the electronic device 100 through the touch display. Zoom in, zoom out, or even rotate the omnidirectional image.

揚聲器105，可以是立體雙聲道喇叭、立體聲耳機等等揚聲器，用以輸出該左、右聲道的輸出音訊，以供使用者聆聽。 The speaker 105 can be a stereo stereo speaker, a stereo earphone, or the like for outputting the output audio of the left and right channels for the user to listen to.

接著參考第2B圖，第2B圖係顯示當一使用者在顯示器104上轉動該全向影像之一部分時，校正基準點會跟著移動的示意圖。如第2B圖所示，圓柱A及立方體B移動到畫面的右下角，控制器103藉由該至少一校正基準點，計算前述轉動相對於該影音共同原點的一偏移角度，該偏移角度包括天頂角(θ)及方位角(φ)。更進一步地，參考第3B圖，第3B圖係顯示使用者轉動全向影像後，調整左、右聲道的有向型樣之示意圖。控制器103根據該偏移角度(θ，φ)調整左、右聲道的有向型樣，計算轉動後各頻率的有向型樣，得到D _L(θ,φ,f)以及D _R(θ,φ,f)。對應地，在第3B圖中，可以看到左、右聲道的有向型樣向順時針偏轉。接著，針對各個頻率，動態調整該等輸入音訊的非線性權重比例，詳細而言，可使用非線性自適應波束形成(nonlinear adaptive beamformer)技術原理，將各麥克風的輸入音訊X₁、X₂、X₃、X₄分別納入，用以調整該等輸出音訊，使其符合該全向影像的轉動。至此，透過控制器103的影音處理，將聲音的變化跟著影像的轉動而做同步的動態處理。 Referring next to Figure 2B, Figure 2B shows a schematic diagram of a calibration reference point moving as a user rotates a portion of the omnidirectional image on display 104. As shown in FIG. 2B, the cylinder A and the cube B move to the lower right corner of the screen, and the controller 103 calculates an offset angle of the rotation relative to the common origin of the video by the at least one correction reference point, the offset The angle includes the zenith angle ( θ ) and the azimuth ( φ ). Furthermore, referring to FIG. 3B, FIG. 3B is a schematic diagram showing the adjustment of the directional patterns of the left and right channels after the user turns the omnidirectional image. The controller 103 adjusts the directional patterns of the left and right channels according to the offset angles ( θ , φ ), calculates the directional patterns of the respective frequencies after the rotation, and obtains D _L (θ, φ, f) and D _R ( θ, φ, f). Correspondingly, in Figure 3B, it can be seen that the directional patterns of the left and right channels are deflected clockwise. Next, for each frequency, dynamically adjusting the weight ratio of the non-linear weight of such audio input, more specifically, using a nonlinear adaptive beamforming (nonlinear adaptive beamformer) technology principle, each microphone audio input X _1, X _2, X ₃ and X ₄ are respectively included to adjust the output audio to conform to the rotation of the omnidirectional image. So far, through the video processing of the controller 103, the dynamic change of the sound is synchronized with the rotation of the image.

第4圖係根據本發明第一實施例或第二實施例之影音處理方法流程圖，用於包括一控制器、一顯示器及一揚聲器之電子裝置。配合參考本發明第一實施例之第1A、1B圖，在步驟401中，藉由控制器103，接收由複數攝影機101與複數全向麥克風102所分別擷取的各方向的複數影像及各方向的複數輸入音訊。在步驟402中，藉由控制器103，縫合複數影像以產生一全向影像，設定全向影像的至少一校正基準點。在步驟403中，藉由控制器103，設定該全向影像與複數輸入音訊的一影音共同原點，計算該影音共同原點的左、右聲道之有向型樣，即D _L(0,0,f)以及D _R(0,0,f)，分別納入該等輸入音訊，產生左、右聲道的輸出音訊。 4 is a flow chart of a method for processing audio and video according to a first embodiment or a second embodiment of the present invention, for an electronic device including a controller, a display, and a speaker. Referring to FIG. 1A and FIG. 1B of the first embodiment of the present invention, in step 401, the controller 103 receives the plurality of images and directions in each direction captured by the plurality of cameras 101 and the plurality of omnidirectional microphones 102, respectively. The plural input audio. In step 402, the controller 103 stitches the plurality of images to generate an omnidirectional image, and sets at least one calibration reference point of the omnidirectional image. In step 403, the controller 103 sets a common origin of the omnidirectional image and the audio and video of the complex input audio, and calculates a directional pattern of the left and right channels of the common origin of the video, that is, D _L (0 , 0, f) and D _R (0, 0, f), respectively, are included in the input audio to generate output audio of the left and right channels.

在步驟404中，藉由顯示器104，用以顯示該全向影像的至少一部分。在步驟405中，藉由揚聲器105，用以輸出該左、右聲道的輸出音訊。在步驟406中，當一使用者轉動顯示器104所顯示的全向影像時，控制器103藉由該至少一校正基準點，計算前述轉動相對於該影音共同原點的一偏移角度(θ，φ)，根據該偏移角度調整左、右聲道的有向型樣，得到D _L(θ,φ,f)以及D _R(θ,φ,f)，調整該等輸入音訊的非線性權重比例，用以調整該等輸出音訊，使其符合全向影像的轉動。 In step 404, at least a portion of the omnidirectional image is displayed by the display 104. In step 405, the speaker 105 is used to output the output audio of the left and right channels. In step 406, when a user rotates the omnidirectional image displayed by the display 104, the controller 103 calculates an offset angle ( θ , of the rotation relative to the common origin of the video by the at least one correction reference point. φ ), adjusting the directional patterns of the left and right channels according to the offset angle, obtaining D _L (θ, φ, f) and D _R (θ, φ, f), adjusting the nonlinear weight of the input audio The ratio is used to adjust the output audio to match the rotation of the omnidirectional image.

此外，在本發明之影音處理方法中，電子裝置100更包括該等攝影機101及該等全向麥克風102，藉由該等攝影機101，用以擷取各方向的該等影像，藉由該等全向麥克風102，用以擷取各方向的該等輸入音訊，其中當該等攝影機擷取該等影像時，該電子裝置係處在固定不動的狀態。 In addition, in the video processing method of the present invention, the electronic device 100 further includes the cameras 101 and the omnidirectional microphones 102, and the cameras 101 are used to capture the images in various directions. The omnidirectional microphone 102 is configured to capture the input audio in each direction, wherein when the cameras capture the images, the electronic device is in a stationary state.

更進一步地，在步驟402中，其中該全向影像係一圓形或圓柱的360度影像。在步驟403中，該有向型樣係一聲音強度函數，其具有天頂角(zenith angle，θ)、方位角(azimuth angle，φ)及複數頻率等參數。值得注意的是，在步驟403中，在計算該影音共同原點的該等有向型樣時，使用包括左、右耳的時間差、音壓差及頻率響應的預設聲音模型。在步驟404中，該偏移角度包括該天頂角及該方位角。 Further, in step 402, wherein the omnidirectional image is a circular or cylindrical 360 degree image. In step 403, the directional pattern is a sound intensity function having parameters such as zenith angle ( θ ), azimuth angle ( φ ), and complex frequency. It should be noted that, in step 403, when calculating the directional patterns of the common origin of the video, a preset sound model including the time difference, the sound pressure difference, and the frequency response of the left and right ears is used. In step 404, the offset angle includes the zenith angle and the azimuth.

因此，透過本發明之具有影音處理功能的電子裝置及其方法，能夠將聲音的變化方式考慮進來，跟著全向影像的轉動而做同步的動態處理，藉此讓使用者有更自然的體驗，並提供使用者身歷其境的影音效果。 Therefore, the electronic device having the audio-visual processing function of the present invention and the method thereof can take the change of the sound into consideration, and perform dynamic processing in synchronization with the rotation of the omni-directional image, thereby giving the user a more natural experience. And provide users with immersive audio and video effects.

雖然本發明已以較佳實施例揭露如上，然其並非用以限定本發明，任何熟悉此項技藝者，在不脫離本發明之精神和範圍內，當可做些許更動與潤飾，因此本發明之保護範圍當視後附之申請專利範圍所界定者為準。 While the present invention has been described in its preferred embodiments, the present invention is not intended to limit the invention, and the present invention may be modified and modified without departing from the spirit and scope of the invention. The scope of protection is subject to the definition of the scope of the patent application.

Claims

An electronic device having a video processing function, comprising: a controller, receiving a plurality of images in each direction captured by a plurality of cameras and a plurality of omnidirectional microphones, and a plurality of input audio signals in each direction, and stitching the plurality of images to generate a plurality of images An omnidirectional image, setting at least one calibration reference point of the omnidirectional image, setting an omnidirectional image and a common origin of the audio and video of the input audio, and calculating a directional pattern of left and right channels of the common origin of the video and audio (directional pattern), respectively incorporating the input audio to generate output audio of the left and right channels; a display for displaying at least a portion of the omnidirectional image; and a speaker for outputting the output of the left and right channels And wherein, when a user rotates the omnidirectional image displayed by the display, the controller calculates an offset angle of the rotation relative to a common origin of the video by the at least one calibration reference point, according to The offset angle adjusts the directional patterns, and adjusts the nonlinear weight ratio of the input audio to adjust the output audio to conform to the full Image rotation.

The electronic device of claim 1, further comprising: the cameras for capturing the images in all directions; and the omnidirectional microphones for capturing the input audio in each direction.

The electronic device of claim 1 or 2, wherein the offset angle comprises a zenith angle ( θ ) and an azimuth angle ( φ ); the directional pattern is a sound intensity A function having the zenith angle, the azimuth angle, and a complex frequency parameter.

The electronic device of claim 1, wherein when the directional patterns of the common origin of the video are calculated, the time including the left and right ears is used. Preset sound model for difference, pitch pressure and frequency response.

The electronic device of claim 2, wherein when the cameras capture the images, the electronic device is in a stationary state.

The electronic device of claim 1, wherein the omnidirectional image is a circular or cylindrical 360-degree image.

An audio and video processing method for an electronic device including a controller, a display and a speaker, the method comprising: receiving, by the controller, a plurality of images in each direction captured by a plurality of cameras and a plurality of omnidirectional microphones And inputting a plurality of audio signals in each direction, stitching the plurality of images to generate an omnidirectional image, setting at least one calibration reference point of the omnidirectional image, setting the omnidirectional image and a common origin of the audio and video of the input audio, and calculating The directional pattern of the left and right channels of the common origin of the video and audio respectively is included in the input audio to generate output audio of the left and right channels; and the display is used to display the omnidirectional image At least a portion of the output audio of the left and right channels is outputted by the speaker; and the controller controls the at least one calibration reference when the user rotates the omnidirectional image displayed by the display Pointing, calculating an offset angle of the rotation relative to the common origin of the video, adjusting the directional patterns according to the offset angle, and adjusting the non-information of the input audio The weight ratio of the right to adjust the audio output such that it rotates in line with the omnidirectional image.

The method for processing audio and video according to claim 7, wherein the offset angle comprises a zenith angle ( θ ) and an azimuth angle ( φ ); the directional pattern is a sound intensity function, There is a zenith angle, the azimuth angle and a complex frequency parameter.

The method for processing audio and video according to claim 7, further comprising: using the time difference, the sound pressure difference, and the frequency response including the left and right ears when calculating the directional patterns of the common origin of the video and audio. Set the sound model.

The method for processing audio and video according to claim 7, wherein when the cameras capture the images, the electronic device is in a stationary state.

The method for processing audio and video according to claim 7, wherein the omnidirectional image is a circular or cylindrical 360-degree image.