TWI787799B

TWI787799B - Method and device for video and audio processing

Info

Publication number: TWI787799B
Application number: TW110115432A
Authority: TW
Inventors: 楊崇文; 吳根丞; 陳怡婷; 黃顯詔
Original assignee: 宏正自動科技股份有限公司
Priority date: 2021-04-28
Filing date: 2021-04-28
Publication date: 2022-12-21
Also published as: CN115250362B; TW202243456A; CN115250362A

Abstract

A method and a device for video and audio processing are provided. The method includes receives a plurality of original video and audio signal for each original video by using a video and audio processing unit; generating one or more output video signal according to the original video by using the video and audio processing unit; adjusting each audio signal according to layout information to obtain a stereo audio signal by using the video and audio processing unit; and outputting the output video signal and the stereo audio signal from the video and audio processing unit to a video and audio playback device.

Description

Video and audio processing method and audio and video processing device thereof

本案是關於影音處理方法及其影音處理裝置，且特別是具有音訊空間化的影音處理方法及其影音處理裝置。This case relates to an audio-visual processing method and an audio-visual processing device thereof, especially an audio-visual processing method with audio spatialization and an audio-visual processing device thereof.

隨著科技的發展，現今的商業型態及娛樂型態有別於以往，商業型態及娛樂型態的改變也直接影響人們的生活，例如，受惠於網路科技之發達與電子產品的強大影音播放功能，形成一股網路直播熱潮，大量興起網路直播平台與網路直播節目，使用者可藉由電子產品觀看直播主的影片並聆聽直播主的聲音，而根據不同的直播內容，人們可享受眾多不同的娛樂型態。With the development of science and technology, today's business model and entertainment model are different from those in the past, and the changes in business model and entertainment model also directly affect people's lives. The powerful audio and video playback function has formed a wave of online live broadcasting. A large number of online live broadcasting platforms and online live broadcasting programs have emerged. Users can watch the video of the live broadcaster and listen to the voice of the live broadcaster through electronic products. According to different live broadcast content , people can enjoy many different types of entertainment.

然而，使用者透過電子產品（如電視、電腦、手機或平板等）播放直播節目的影片及聲音時，電子產品並無法根據直播主的說話位置產生具有空間化的立體聲音音效，使用者觀看直播節目時無法具有臨場感，大幅地減低使用者體驗。However, when users use electronic products (such as TVs, computers, mobile phones, or tablets, etc.) to play videos and sounds of live programs, the electronic products cannot produce spatialized stereo sound effects according to the speaking position of the live broadcaster. It is impossible to have a sense of presence during the program, which greatly reduces the user experience.

鑒於上述問題，本發明提出一種影音處理裝置，特別是一種可以讓聆聽者更有臨場感的影音處理裝置。In view of the above problems, the present invention proposes an audio-visual processing device, especially an audio-visual processing device that can make listeners feel more present.

一實施例提供一種影音處理裝置，其適於連接一影音播放裝置。影音處理裝置包含影像處理單元、聲音處理單元及輸出埠。聲音處理單元連接影像處理單元。輸出埠耦接影像處理單元與聲音處理單元。影像處理單元接收複數初始影片並根據複數初始影片產生至少一輸出影片訊號。聲音處理單元接收版面配置資訊及對應各初始影片的聲音訊號，並且依據版面配置資訊調整各聲音訊號為立體聲音訊號。輸出埠輸出立體聲音訊號與輸出影片訊號至影音播放裝置，以致影音播放裝置透過播放輸出影片訊號而輸出具有複數初始影片在版面配置資訊對應的複數顯示區塊上的顯示畫面。An embodiment provides an audio-visual processing device, which is suitable for connecting with an audio-visual playback device. The video and audio processing device includes a video processing unit, a sound processing unit and an output port. The sound processing unit is connected to the image processing unit. The output port is coupled to the image processing unit and the audio processing unit. The image processing unit receives a plurality of initial images and generates at least one output image signal according to the plurality of initial images. The audio processing unit receives the layout information and the audio signals corresponding to each original video, and adjusts each audio signal into a stereo audio signal according to the layout information. The output port outputs the stereo audio signal and the output video signal to the audio-visual playback device, so that the audio-visual playback device outputs the display images on the plurality of display blocks corresponding to the layout information of the plurality of initial videos by playing the output video signal.

一實施例提供一種音訊空間化的影音處理方法，其適用於一影音處理裝置，並且此影音處理裝置連接影音播放裝置。於此，影音處理方法包含：藉由影音處理裝置接收複數初始影片及對應各初始影片的一聲音訊號；藉由影音處理裝置根據複數初始影片產生至少一輸出影片訊號；藉由影音處理裝置根據一版面配置資訊調整各聲音訊號為一立體聲音訊號；及藉由影音處理裝置輸出至少一輸出影片訊號與立體聲音訊號至影音播放裝置，以致影音播放裝置透過播放至少一輸出影片訊號而輸出具有複數初始影片在版面配置資訊對應的複數顯示區塊上的一顯示畫面。An embodiment provides an audio-spatialized audio-visual processing method, which is applicable to an audio-visual processing device, and the audio-visual processing device is connected to an audio-visual playback device. Here, the audio-visual processing method includes: receiving a plurality of initial videos and an audio signal corresponding to each initial video by the audio-visual processing device; generating at least one output video signal according to the plurality of initial videos by the audio-visual processing device; The layout information adjusts each audio signal into a stereo audio signal; and outputs at least one output video signal and stereo audio signal to the audio-visual playback device through the audio-visual processing device, so that the audio-visual playback device outputs a plurality of initials by playing at least one output video signal A display frame of the video on the plurality of display blocks corresponding to the layout information.

綜上所述，根據本案之影音處理方法及其影音處理裝置之任一實施例，影音處理裝置可根據影片的顯示位置對應地調整影片的聲音訊號的訊號參數。在一些實施例中，立體聲音訊號與聲音訊號相比有音量、延遲、相位、頻率或其他等任意組合的訊號參數之差異；或立體聲音訊號是對應於影音播放裝置的各聲道，於此，立體聲音訊號於不同聲道輸出時，其在音量、延遲、相位、頻率或其他等任意組合的訊號參數上互為不同。藉此，可建立三度空間中的音訊空間化效果，以提供使用者享受到立體影音效果。To sum up, according to any embodiment of the audio-visual processing method and the audio-visual processing device of the present application, the audio-visual processing device can adjust the signal parameters of the audio signal of the video correspondingly according to the display position of the video. In some embodiments, the stereo audio signal has a difference in volume, delay, phase, frequency or other signal parameters in any combination compared with the audio signal; or the stereo audio signal corresponds to each channel of the audio-visual playback device, here , when stereo audio signals are output in different channels, they are different in volume, delay, phase, frequency or any combination of other signal parameters. In this way, the audio spatialization effect in the three-dimensional space can be established, so as to provide users with a three-dimensional audio-visual effect.

請參照圖1或圖2，在一些實施例中，影音處理裝置1可連接在影音擷取裝置2與影音播放裝置3之間。雖然圖示繪製出二影音擷取裝置2（如影音擷取裝置21、22）與二影音播放裝置3，然本案的影音擷取裝置2與影音播放裝置3並不限於此些數量。Please refer to FIG. 1 or FIG. 2 , in some embodiments, the video and audio processing device 1 may be connected between the video and audio capture device 2 and the video and audio playback device 3 . Although the illustration shows two video capture devices 2 (such as video capture devices 21 , 22 ) and two video playback devices 3 , the number of video capture devices 2 and video playback devices 3 in this case is not limited to these numbers.

以影音處理裝置1耦接二影音擷取裝置21、22為例，影音擷取裝置21可對目標41進行拍攝以生成影片V1，並且影片V1包含目標41的動態影像。其中，目標41的動態影像是由複數第一影像幀以第一幀率所構成。另，影音擷取裝置22可對目標42進行拍攝以生成影片V2，並且影片V2包含目標42的動態影像。其中，目標42的動態影像是由複數第二影像幀以第二幀率所構成。於此，第一幀率與第二幀率可為相同或為不同。其中，影音擷取裝置21、22在擷取初始影片V1、V2時，影音擷取裝置21、22亦可同步地擷取目標41的聲音及目標42的聲音，並產生目標41的聲音訊號A1及目標42的聲音訊號A2。Taking the audio-visual processing device 1 coupled to the two video-audio capture devices 21 and 22 as an example, the video-audio capture device 21 can shoot the target 41 to generate a video V1, and the video V1 includes the moving image of the target 41 . Wherein, the dynamic image of the target 41 is composed of a plurality of first image frames at a first frame rate. In addition, the audio-video capture device 22 can shoot the object 42 to generate a video V2, and the video V2 includes a moving image of the object 42 . Wherein, the dynamic image of the target 42 is composed of a plurality of second image frames at the second frame rate. Here, the first frame rate and the second frame rate may be the same or different. Wherein, when the audio-visual capture devices 21 and 22 capture the initial videos V1 and V2, the video-video capture devices 21 and 22 can also capture the voice of the target 41 and the voice of the target 42 synchronously, and generate the voice signal A1 of the target 41 and the sound signal A2 of the target 42 .

在一實施例中，各影音擷取裝置21、22擷取到的初始影片V1、V2可以為單一影音串流（即存於一檔案中），亦可以為二個以上影音串流（即存於二個以上檔案中）。In one embodiment, the initial videos V1, V2 captured by each video capture device 21, 22 can be a single video stream (that is, stored in a file), or more than two video streams (that is, stored in a file). in two or more files).

參照圖1至圖3，影音處理裝置1可自不同的影音擷取裝置21、22接收不同目標41、42的初始影片V1、V2及聲音訊號A1、A2（步驟S01）。於接收到初始影片V1、V2及聲音訊號A1、A2後，影音處理裝置1可根據接收到的初始影片V1、V2產生至少一輸出影片訊號Vo（步驟S02），並且根據版面配置資訊L1調整各聲音訊號A1、A2為立體聲音訊號S1、S2（步驟S03）。然後，影音處理裝置1輸出輸出影片訊號Vo與立體聲音訊號S1、S2至一個或多個影音播放裝置3（步驟S04），各影音播放裝置3即能播放輸出影片訊號Vo，且各影音播放裝置3的顯示畫面是對應於初始影片V1、V2在版面配置資訊L1對應的顯示區塊上。Referring to FIG. 1 to FIG. 3 , the video processing device 1 can receive initial videos V1 , V2 and audio signals A1 , A2 of different objects 41 , 42 from different video capture devices 21 , 22 (step S01 ). After receiving the initial video V1, V2 and the audio signal A1, A2, the audio-visual processing device 1 can generate at least one output video signal Vo according to the received initial video V1, V2 (step S02), and adjust each audio signal according to the layout information L1. The audio signals A1, A2 are stereo audio signals S1, S2 (step S03). Then, the audio-visual processing device 1 outputs the output video signal Vo and the stereo audio signals S1, S2 to one or more audio-visual playback devices 3 (step S04), each audio-visual playback device 3 can play the output video signal Vo, and each audio-visual playback device The display screen of 3 corresponds to the initial video V1, V2 on the display block corresponding to the layout information L1.

於此，版面配置資訊L1可為輸出畫面的顯示版面的配置狀態，即是定義輸出畫面所包括的顯示區塊的數量及各顯示區塊的方位及/或尺寸。在輸出畫面中，各顯示區塊可用以顯示所對應的單一影像來源的初始影片之影像幀。其中，輸出畫面可以是指影音播放裝置3所顯示的顯示畫面。在一些實施例中，輸出畫面還可以是指影音處理裝置1內建的顯示螢幕（圖未示）所顯示的顯示畫面。Here, the layout information L1 may be the configuration state of the display layout of the output screen, that is, defines the number of display blocks included in the output screen and the orientation and/or size of each display block. In the output image, each display block can be used to display the corresponding image frame of the original video from a single image source. Wherein, the output screen may refer to a display screen displayed by the audio-visual playing device 3 . In some embodiments, the output image may also refer to a display image displayed on a display screen (not shown) built in the audio-video processing device 1 .

並且，影音處理裝置1可指定初始影片V1、V2顯示於影音播放裝置3的顯示版面中的不同配置位置。在一些實施例中，版面配置資訊L1包括不同來源之初始影片V1、V2所分別對應的配置位置Lo1、Lo2。其中，版面配置資訊L1中的配置位置Lo1、Lo2一對一對應於輸出畫面所包括的顯示區塊，並且各配置位置Lo1、Lo2定義用以所對應的輸出區塊的版面位置。在一些實施例中，各配置位置Lo1、Lo2對應於一影像來源。舉例來說，在版面配置資訊L1中，影像來源可以來源編號表示，並且各配置位置Lo1、Lo2能以對應的來源編號作為索引。Moreover, the audio-visual processing device 1 can designate the initial videos V1 and V2 to be displayed at different positions in the display layout of the audio-visual playback device 3 . In some embodiments, the layout information L1 includes layout positions Lo1 and Lo2 respectively corresponding to the original videos V1 and V2 from different sources. Wherein, the configuration positions Lo1 and Lo2 in the layout information L1 correspond one-to-one to the display blocks included in the output screen, and each configuration position Lo1 and Lo2 defines the layout position for the corresponding output block. In some embodiments, each configuration location Lo1, Lo2 corresponds to an image source. For example, in the layout information L1, the source of the image can be represented by a source number, and each configuration position Lo1, Lo2 can use the corresponding source number as an index.

在一些實施例中，請參照圖1或圖2，影音處理裝置1包含輸入單元10、影像處理單元11、聲音處理單元12及輸出埠13。輸入單元10耦接影像處理單元11與聲音處理單元12。影像處理單元11耦接聲音處理單元12。輸出埠13耦接影像處理單元11與聲音處理單元12。In some embodiments, please refer to FIG. 1 or FIG. 2 , the video and audio processing device 1 includes an input unit 10 , an image processing unit 11 , a sound processing unit 12 and an output port 13 . The input unit 10 is coupled to the image processing unit 11 and the audio processing unit 12 . The image processing unit 11 is coupled to the audio processing unit 12 . The output port 13 is coupled to the image processing unit 11 and the audio processing unit 12 .

輸入單元10耦接一個或多個影音擷取裝置21、22，且輸出埠13耦接一個或多個影音播放裝置3。其中，輸入單元10能自各影音擷取裝置21、22接收初始影片V1、V2與聲音訊號A1、A2，然後進一步將所接收到的初始影片V1、V2提供給影像處理單元11，以及將所接收到的聲音訊號A1、A2提供給聲音處理單元12。The input unit 10 is coupled to one or more video capture devices 21 , 22 , and the output port 13 is coupled to one or more video playback devices 3 . Wherein, the input unit 10 can receive the original video V1, V2 and the audio signal A1, A2 from each audio-visual capture device 21, 22, and then further provide the received initial video V1, V2 to the image processing unit 11, and the received The received audio signals A1 and A2 are provided to the audio processing unit 12 .

於此，影像處理單元11還可接收版面配置資訊L1。於一實施例中，可由使用者操作影音處理裝置1的使用者介面（圖未示）而生成之用以設定版面配置的控制訊號C1，據以得到版面配置資訊L1。Here, the image processing unit 11 can also receive the layout information L1. In one embodiment, the layout information L1 can be obtained by the control signal C1 for setting the layout generated by the user operating the user interface (not shown) of the audio-visual processing device 1 .

在一實施例中，儲存單元可預先儲存多種預設之版面配置資訊，並且此些預設之預設版面的版面配置資訊分別用以指示不同的顯示畫面的顯示版面（如，子母畫面、畫面並列、懸浮畫面、畫面堆疊等）的配置狀態（如，子畫面的數量、各子畫面的影像來源、各子畫面的顯示位置、各子畫面的尺寸或其任意組合等）。使用者透過操作使用者介面（如，觸控螢幕或顯示螢幕與實體按鍵之組合等）以產生欲使用之特定顯示版面的選擇訊號，然後影像處理單元11再根據選擇訊號從儲存單元中讀出對應的預設之版面配置資訊以作為版面配置資訊L1。在另一實施例中，儲存單元可預先儲存一種預設的版面配置資訊，其為預設影音播放裝置3所顯示之顯示畫面的顯示版面（即預設版面）的配置狀態。於啟動時，影像處理單元11可從儲存單元中讀出預設的版面配置資訊做為版面配置資訊L1。於此，當使用者透過操作使用者介面以產生設定訊號時，影像處理單元11會根據此設定訊號取得對應的版面配置資訊以更新當前使用的版面配置資訊L1。In one embodiment, the storage unit can pre-store a variety of preset layout information, and the layout information of these preset preset layouts are respectively used to indicate the display layouts of different display screens (such as picture-in-picture, picture-in-picture, The configuration state of screen juxtaposition, floating screen, screen stacking, etc.) (such as the number of sub-screens, the image source of each sub-screen, the display position of each sub-screen, the size of each sub-screen or any combination thereof, etc.). The user generates a selection signal for a specific display layout to be used by operating the user interface (such as a touch screen or a combination of a display screen and a physical button, etc.), and then the image processing unit 11 reads out from the storage unit according to the selection signal. The corresponding default layout information is used as the layout information L1. In another embodiment, the storage unit may pre-store a preset layout information, which is the layout status of the display layout (ie, default layout) of the display screen displayed by the preset audio-visual playback device 3 . When starting up, the image processing unit 11 can read out the default layout information from the storage unit as the layout information L1. Here, when the user generates a setting signal by operating the user interface, the image processing unit 11 obtains corresponding layout information according to the setting signal to update the currently used layout information L1.

在一些實施例中，影像處理單元11可依據版面配置資訊L1設定初始影片V1、V2顯示於影音播放裝置3的版面位置。In some embodiments, the image processing unit 11 can set the layout positions of the initial videos V1 and V2 displayed on the audio-visual playback device 3 according to the layout information L1.

舉例來說，以分別提供初始影片V1、V2之二個影像來源，且版面配置資訊L1所指示的顯示版面有二個子畫面的版面位置為例。在一實施例中，顯示版面可為畫面並列的配置狀態，即顯示畫面分為左右二子畫面。在顯示版面中，如圖4所示，初始影片V1的版面位置及初始影片V2的版面位置可分別為相對於預設參考點L2位於左方的第一位置及位於右方的第二位置。其中，預設參考點L2可為顯示版面的中心點。例如，整個版面對分為X-Y平面的左右二區塊，第一位置可為X-Y平面中相對於預設參考點L2位於左方的左方版面位置31（即左半邊區塊），第二位置可為X-Y平面中相對於預設參考點L2位於右方的右方版面位置32（即右半邊區塊）。此時，影像處理單元11可依據版面配置資訊L1設定第一影像來源之初始影片V1顯示於左方版面位置31以及設定第二影像來源之初始影片V2顯示於右方版面位置32。在另一實施例中，顯示版面可為畫面堆疊的配置狀態，即顯示畫面分為上下二子畫面。在顯示版面中，如圖5所示，初始影片V1的版面位置及初始影片V2的版面位置可分別為相對於預設參考點L2位於上方的第一位置及位於下方的第二位置。其中，預設參考點L2可為顯示版面的中心點。例如，整個版面對分為在X-Y平面的上下二區塊，第一位置可為X-Y平面中相對於預設參考點L2位於上方的上方版面位置33（即上半邊區塊），並且第二位置可為X-Y平面中相對於預設參考點L2位於下方的下方版面位置34（即下半邊區塊）。此時，影像處理單元11可依據版面配置資訊L1設定第一影像來源之初始影片V1顯示於上方版面位置33，並可依據版面配置資訊L1設定第二影像來源之初始影片V2顯示於下方版面位置34。For example, take two image sources of the initial video V1 and V2 as an example, and the display layout indicated by the layout information L1 has two sub-picture layout positions as an example. In an embodiment, the display layout can be arranged in a state where the screens are arranged side by side, that is, the display screen is divided into two sub-pictures on the left and the right. In the display layout, as shown in FIG. 4 , the layout position of the initial video V1 and the layout position of the initial video V2 can be respectively a first position on the left and a second position on the right relative to the preset reference point L2 . Wherein, the preset reference point L2 may be the center point of the display layout. For example, the entire layout pair is divided into two left and right blocks on the X-Y plane. The first position can be the left layout position 31 (i.e., the left half block) on the left relative to the preset reference point L2 in the X-Y plane. The position may be the right layout position 32 (ie, the right half block) on the X-Y plane relative to the preset reference point L2. At this time, the image processing unit 11 can set the initial video V1 of the first image source to be displayed at the left layout position 31 and set the initial video V2 of the second image source to be displayed at the right layout position 32 according to the layout information L1. In another embodiment, the display layout may be in a stacked configuration state, that is, the display screen is divided into upper and lower sub-screens. In the display layout, as shown in FIG. 5 , the layout positions of the initial video V1 and the initial video V2 may be a first position above and a second position below relative to the preset reference point L2 respectively. Wherein, the preset reference point L2 may be the center point of the display layout. For example, the entire layout pair is divided into upper and lower blocks on the X-Y plane, the first position may be the upper layout position 33 (that is, the upper half block) located above the preset reference point L2 in the X-Y plane, and the second The position may be the lower layout position 34 (ie, the lower half block) located below relative to the preset reference point L2 in the X-Y plane. At this time, the image processing unit 11 can set the initial video V1 of the first image source to be displayed at the upper layout position 33 according to the layout information L1, and can set the initial video V2 of the second image source to be displayed at the lower layout position according to the layout information L1 34.

在又一實施例中，顯示版面可為子母畫面的配置狀態，即顯示畫面分為一大一小的二子畫面。在顯示版面中，初始影片V1的版面位置及初始影片V2的版面位置可分別為版面較大的第一位置及版面較小且與第一位置重疊（以正視而言）的第二位置。此時，影像處理單元11可依據版面配置資訊L1設定第一影像來源之初始影片V1顯示於第一位置，並可依據版面配置資訊L1設定第二影像來源之初始影片V2顯示於第二位置。In yet another embodiment, the display layout may be a picture-in-picture configuration state, that is, the display screen is divided into two sub-pictures, one large and one small. In the display layout, the layout position of the initial video V1 and the layout position of the initial video V2 can be respectively a first position with a larger layout and a second position with a smaller layout and overlapping with the first position (in terms of front view). At this time, the image processing unit 11 can set the initial video V1 of the first image source to be displayed at the first position according to the layout information L1, and can set the initial video V2 of the second image source to be displayed at the second position according to the layout information L1.

參照圖1至圖3，聲音處理單元12自影像處理單元11接收版面配置資訊L1，以致聲音處理單元12可根據版面配置資訊L1中初始影片V1、V2對應的配置位置Lo1、Lo2分別調整聲音訊號A1、A2為立體聲音訊號S1、S2（步驟S03）。Referring to FIGS. 1 to 3 , the sound processing unit 12 receives the layout information L1 from the image processing unit 11, so that the sound processing unit 12 can adjust the sound signals respectively according to the layout positions Lo1 and Lo2 corresponding to the initial videos V1 and V2 in the layout information L1. A1, A2 are stereo audio signals S1, S2 (step S03).

在一些實施例中，輸出影片訊號Vo的數量可為多個，並且一對一對應於初始影片V1、V2。於此，各輸出影片訊號Vo是由對應的初始影片V1或V2所構成。此時，輸出埠13亦可以多個音訊串流的形式輸出立體聲音訊號S1、S2，如圖1所示。In some embodiments, the number of output video signals Vo may be multiple, and one-to-one corresponds to the initial video V1, V2. Here, each output video signal Vo is composed of a corresponding initial video V1 or V2. At this time, the output port 13 can also output the stereo audio signals S1 and S2 in the form of multiple audio streams, as shown in FIG. 1 .

在步驟S02的第一實施例中，影像處理單元11可在初始影片V1、V2的資料串流中加入來源標記以形成輸出影片訊號Vo，藉以利用各初始影片V1、V2所攜帶的來源標記來告知影音播放裝置3顯示各初始影片V1、V2的版面位置。其中，來源標記可對應或相同於當前使用的版面配置資訊L1中代表影像來源的資料。換言之，影像處理單元11可將各初始影片V1與V2與其對應的配置位置Lo1與Lo2整合成一個串流（即輸出影片訊號Vo）後，再將輸出影片訊號Vo輸出給影音播放裝置3。藉此，在接收到輸出影片訊號Vo時，影音播放裝置3即可將輸出影片訊號Vo的影像幀顯示在顯示畫面中對應的顯示位置上，即顯示畫面對應配置位置Lo1、Lo2的位置分別呈現初始影片V1、V2的影像幀。In the first embodiment of step S02, the image processing unit 11 may add source tags to the data streams of the initial videos V1, V2 to form the output video signal Vo, so as to use the source tags carried by each initial video V1, V2 to Inform the audio-visual playback device 3 to display the layout positions of each initial video V1, V2. Wherein, the source tag may correspond to or be the same as the data representing the source of the image in the currently used layout information L1. In other words, the image processing unit 11 can integrate the initial videos V1 and V2 and their corresponding positions Lo1 and Lo2 into one stream (that is, output the video signal Vo), and then output the output video signal Vo to the audio-visual playback device 3 . In this way, when receiving the output video signal Vo, the audio-visual playback device 3 can display the image frame of the output video signal Vo at the corresponding display position in the display screen, that is, the positions corresponding to the configuration positions Lo1 and Lo2 of the display screen are presented respectively. The image frames of the initial movie V1, V2.

在步驟S02的第二實施例中，影像處理單元11可將初始影片V1、V2的來源標記分別關聯至版面配置資訊L1中表示影像來源的資料，藉以利用各初始影片V1、V2所攜帶的來源標記來告知影音播放裝置3其顯示的版面位置。舉例來說，在傳送輸出影片訊號Vo之前，影像處理單元11先傳送關聯訊息給影音播放裝置3，再將輸出影片訊號Vo傳送給影音播放裝置3。其中，關聯訊息是記錄來源標記與版面配置資訊L1的配置位置Lo1、Lo2的對應關係。影音播放裝置3再根據各初始影片V1、V2的來源標記、接收到的關聯訊息及預存的當前使用的版面配置資訊L1得知此初始影片V1、V2的配置位置Lo1、Lo2，並且依據此配置位置Lo1、Lo2在顯示畫面中對應的版面位置上顯示此初始影片V1、V2。In the second embodiment of step S02, the image processing unit 11 can associate the source tags of the initial videos V1 and V2 with the data indicating the source of the images in the layout information L1, so as to utilize the sources carried by the original videos V1 and V2 mark to inform the audio-visual playback device 3 of its displayed layout position. For example, before transmitting the output video signal Vo, the image processing unit 11 first transmits the associated information to the audio-visual playback device 3 , and then transmits the output video signal Vo to the audio-visual playback device 3 . Wherein, the associated information is the corresponding relationship between the record source tag and the layout positions Lo1 and Lo2 of the layout information L1. The audio-visual playback device 3 then learns the configuration positions Lo1 and Lo2 of the initial videos V1 and V2 according to the source tags of the original videos V1 and V2, the received associated information and the pre-stored currently used layout information L1, and according to the configuration The positions Lo1 and Lo2 display the initial films V1 and V2 on the corresponding layout positions in the display screen.

在步驟S02的第三實施例中，影像處理單元11可在初始影片V1、V2的資料串流中加入表示其配置位置Lo1、Lo2的資料，藉以使影音播放裝置3利用表示其配置位置Lo1、Lo2的資料得知初始影片V1、V2顯示的版面位置。舉例來說，在傳送各初始影片V1、V2之前，影像處理單元11從版面配置資訊L1取得各配置位置Lo1、Lo2，並分別輸出至輸出埠13之欲傳送初始影片V1、V2的二通道。具體而言，影像處理單元11先經由各初始影片V1、V2對應的通道輸出表示此初始影片V1、V2的配置位置Lo1、Lo2的資料給影音播放裝置3，再接續經由此通道輸出初始影片V1、V2給影音播放裝置3。In the third embodiment of step S02, the image processing unit 11 may add data indicating its configuration positions Lo1, Lo2 in the data streams of the initial videos V1, V2, so that the audio-visual playback device 3 uses the data indicating its configuration positions Lo1, The information of Lo2 knows the layout position of the initial video V1 and V2. For example, before transmitting each initial video V1, V2, the image processing unit 11 obtains each configuration position Lo1, Lo2 from the layout information L1, and outputs to the two channels of the output port 13 to transmit the initial video V1, V2 respectively. Specifically, the image processing unit 11 first outputs data indicating the locations Lo1 and Lo2 of the initial videos V1 and V2 to the audio-visual playback device 3 through channels corresponding to the initial videos V1 and V2, and then outputs the initial video V1 through this channel. , V2 to the audio-visual playback device 3 .

在另一些實施例中，輸出影片訊號Vo的數量可為單一個，並且輸出影片訊號Vo是由初始影片V1、V2所構成。舉例來說，輸出影片訊號Vo可為由初始影片V1、V2拼接而成的整合影片。此時，輸出埠13可以單一音訊串流的形式輸出立體聲音訊號S1、S2，即輸出經整合立體聲音訊號S1、S2的單一立體音訊串流（以下稱整合聲音訊號So）。In some other embodiments, the quantity of the output video signal Vo can be a single one, and the output video signal Vo is composed of initial videos V1 and V2. For example, the output video signal Vo can be an integrated video spliced from the original videos V1 and V2. At this time, the output port 13 can output the stereo audio signals S1 and S2 in the form of a single audio stream, that is, output a single stereo audio stream of the integrated stereo audio signals S1 and S2 (hereinafter referred to as the integrated audio signal So).

在步驟S02的第四實施例中，影像處理單元11可先將初始影片V1、V2的版面位置以一對一指定為版面配置資訊L1中的版面位置Lo1、Lo2，然後於接收到初始影片V1、V2時再依據指定的版面位置Lo1、Lo2將初始影片V1、V2拼接成單一個輸出影片訊號Vo（如圖3所示）。具體而言，影像處理單元11將相同播放時間點的第一影像幀（即初始影片V1）與第二影像幀（即初始影片V2）依據版面配置資訊L1中各自對應的配置位置Lo1、Lo2組合或先裁切再組合以形成第三影像幀（即輸出影片訊號Vo），然後再經由輸出埠13將第三影像幀輸出至影音播放裝置3。其中，第三影像幀的一部分為第一影像幀（完整畫面）或第一影像幀的局部，而第三影像幀的另一部分為第二影像幀（完整畫面）或第二影像幀的局部。In the fourth embodiment of step S02, the image processing unit 11 may first designate the layout positions of the initial video V1 and V2 as the layout positions Lo1 and Lo2 in the layout information L1 in a one-to-one manner, and then receive the initial video V1 , V2 and then splicing the initial video V1, V2 into a single output video signal Vo according to the designated layout positions Lo1, Lo2 (as shown in Figure 3). Specifically, the image processing unit 11 combines the first image frame (that is, the initial video V1) and the second image frame (that is, the initial video V2) at the same playback time point according to the respective corresponding configuration positions Lo1 and Lo2 in the layout information L1. Or cut and combine to form the third image frame (ie output video signal Vo), and then output the third image frame to the audio-visual playback device 3 through the output port 13 . Wherein, a part of the third image frame is the first image frame (complete frame) or a part of the first image frame, and another part of the third image frame is the second image frame (complete frame) or a part of the second image frame.

並且，在步驟S03中，聲音處理單元12在根據版面配置資訊L1調整聲音訊號A1、A2為立體聲音訊號S1、S2後，還會依據播放時間將立體聲音訊號S1、S2合成為單一立體音訊串流（以下稱整合聲音訊號So），然後再經由輸出埠13輸出整合聲音訊號So給各影音播放裝置3。於接收到輸出影片訊號Vo與整合聲音訊號So時，影音播放裝置3會同步播放輸出影片訊號Vo與整合聲音訊號So。Moreover, in step S03, after the audio processing unit 12 adjusts the audio signals A1 and A2 into stereo audio signals S1 and S2 according to the layout information L1, it also synthesizes the stereo audio signals S1 and S2 into a single stereo audio string according to the playing time. (hereinafter referred to as the integrated audio signal So), and then output the integrated audio signal So to each audio-visual playback device 3 through the output port 13 . When receiving the output video signal Vo and the integrated audio signal So, the audio-visual playback device 3 will play the output video signal Vo and the integrated audio signal So synchronously.

然後，參照圖1（或圖2）及圖3，聲音處理單元12經由輸出埠13輸出由初始影片V1、V2構成的一個或多個輸出影片訊號Vo與立體聲音訊號S1、S2（即個別獨立之立體聲音訊號S1、S2或立體聲音訊號S1、S2整合後的整合聲音訊號So）至影音播放裝置3（步驟S04），影音播放裝置3即能同步播放目標41、42的初始影片V1、V2及立體聲音訊號S1、S2。於此，立體聲音訊號S1的播放效果對應於初始影片V1顯示於影音播放裝置3的版面位置。立體聲音訊號S2的播放效果對應於初始影片V2顯示於影音播放裝置3的版面位置。Then, with reference to FIG. 1 (or FIG. 2 ) and FIG. 3 , the sound processing unit 12 outputs one or more output video signals Vo and stereo audio signals S1 and S2 (that is, individually independent video signals V1 and V2) through the output port 13. The stereo sound signal S1, S2 or the integrated sound signal So after the integration of the stereo sound signal S1, S2) is sent to the audio-visual playback device 3 (step S04), and the audio-visual playback device 3 can synchronously play the initial video V1, V2 of the target 41, 42 And stereo audio signals S1, S2. Here, the playback effect of the stereo audio signal S1 corresponds to the layout position of the initial video V1 displayed on the video playback device 3 . The playback effect of the stereo audio signal S2 corresponds to the layout position of the original video V2 displayed on the audio-visual playback device 3 .

如此一來，使用者經由影音播放裝置3所根據的初始影片V1、V2之不同版面位置而可以感受到對應的立體聲音訊號S1、S2，如此可提供使用者音訊之空間化效果。In this way, the user can experience the corresponding stereo audio signals S1, S2 through the different layout positions of the original video V1, V2 based on the audio-visual playback device 3, which can provide the spatialization effect of the user's audio.

在一些實施例中，各配置位置Lo1、Lo2可為座標資訊。換言之，版面配置資訊L1包括表示初始影片V1、V2的版面位置的座標資訊。並且，版面配置資訊L1更包括預設參考點L2的座標資訊。於此，預設參考點L2是指輸出畫面的觀看者的預設位置。在一些實施例中，假設顯示畫面的版面配置是以二維配置（如圖4及圖5所示之輸出畫面的所有配置位置Lo1、Lo2均在同一XY平面上），預設參考點L2可為版面配置的中心點。在另一些實施例中，假設顯示畫面的版面配置是以三維配置（如圖6所示之輸出畫面的配置位置Lo1、Lo2相互平行且位在多個XY平面上），預設參考點L2可為版面配置中所有版面位置於前方版面位置所在平面上或於前方版面位置之前的任一平行平面上所形成的垂直投影的中心點。In some embodiments, each configuration location Lo1, Lo2 can be coordinate information. In other words, the layout information L1 includes coordinate information indicating the layout positions of the initial videos V1 and V2. Moreover, the layout information L1 further includes coordinate information of the default reference point L2. Herein, the preset reference point L2 refers to the preset position of the viewer of the output image. In some embodiments, assuming that the layout of the display screen is configured in two dimensions (all the configuration positions Lo1 and Lo2 of the output screen shown in Figure 4 and Figure 5 are on the same XY plane), the preset reference point L2 can be The center point of the layout. In some other embodiments, assuming that the layout of the display screen is in a three-dimensional configuration (as shown in FIG. It is the center point of the vertical projection formed by all the layout positions in the layout configuration on the plane where the front layout position is located or on any parallel plane before the front layout position.

在步驟S03的一些實施例中，以二初始影片V1、V2為例，聲音處理單元12可根據版面配置資訊L1中用以表示初始影片V1的版面位置的座標資訊以及預設參考位置（如預設參考點L2）的座標資訊決定音訊調整值（以下稱為第一音訊調整值）。並且，聲音處理單元12可根據版面配置資訊L1中用以表示初始影片V2的版面位置的座標資訊以及預設參考點L2的座標資訊決定另一音訊調整值（以下稱為第二音訊調整值）。然後，聲音處理單元12再根據第一音訊調整值調整聲音訊號A1為立體聲音訊號S1，並根據第二音訊調整值調整聲音訊號A2為立體聲音訊號S2。In some embodiments of step S03, taking the two initial videos V1 and V2 as an example, the sound processing unit 12 may use the coordinate information indicating the layout position of the initial video V1 in the layout information L1 and the preset reference position (such as preset Assume that the coordinate information of the reference point L2) determines the audio adjustment value (hereinafter referred to as the first audio adjustment value). Moreover, the sound processing unit 12 may determine another audio adjustment value (hereinafter referred to as the second audio adjustment value) according to the coordinate information of the layout information L1 used to indicate the layout position of the initial video V2 and the coordinate information of the preset reference point L2 . Then, the audio processing unit 12 adjusts the audio signal A1 into a stereo audio signal S1 according to the first audio adjustment value, and adjusts the audio signal A2 into a stereo audio signal S2 according to the second audio adjustment value.

在步驟S03的一些實施例中，以二初始影片V1、V2為例，聲音處理單元12是以初始影片V1對應的座標資訊與預設參考點L2的座標資訊計算此初始影片V1的版面位置與預設參考點L2之間的相對距離，然後再根據前述得到的相對距離決定此初始影片V1的聲音訊號A1的音訊調整值，進而利用決定的音訊調整值調整對應的聲音訊號A1而生成立體聲音訊號S1。類似地，聲音處理單元12亦以初始影片V2對應的座標資訊與預設參考點L2的座標資訊計算出相對距離，進而決定初始影片V2的聲音訊號A2的音訊調整值，藉此調整聲音訊號A2而生成立體聲音訊號S2。In some embodiments of step S03, taking two initial videos V1 and V2 as an example, the sound processing unit 12 calculates the layout position and position of the initial video V1 based on the coordinate information corresponding to the initial video V1 and the coordinate information of the preset reference point L2. Preset the relative distance between the reference points L2, and then determine the audio adjustment value of the audio signal A1 of the initial video V1 according to the relative distance obtained above, and then use the determined audio adjustment value to adjust the corresponding audio signal A1 to generate stereo sound Signal S1. Similarly, the audio processing unit 12 also calculates the relative distance by using the coordinate information corresponding to the initial video V2 and the coordinate information of the preset reference point L2, and then determines the audio adjustment value of the audio signal A2 of the initial video V2, thereby adjusting the audio signal A2 And a stereo audio signal S2 is generated.

在一些實施例中，相對距離可正比於音訊調整值。也就是說，若初始影片V1、V2的版面位置愈鄰近預設參考點L2，聲音處理單元12產生之音訊調整值愈小，即聲音處理單元12在產生此初始影片V1、V2對應的立體聲音訊號S1、S2時需調整的音訊調整量愈少。反之，若初始影片V1、V2的版面位置愈遠離預設參考點L2，聲音處理單元12產生之音訊調整值愈大，則對應的立體聲音訊號S1、S2需調整的音訊調整量愈多。In some embodiments, the relative distance may be proportional to the audio adjustment value. That is to say, if the layout position of the initial video V1, V2 is closer to the preset reference point L2, the audio adjustment value generated by the sound processing unit 12 is smaller, that is, the audio processing unit 12 generates the stereo sound corresponding to the initial video V1, V2 The less audio adjustments need to be adjusted for signals S1 and S2. Conversely, if the layout position of the initial video V1, V2 is farther away from the preset reference point L2, the audio adjustment value generated by the sound processing unit 12 is larger, and the corresponding stereo audio signal S1, S2 needs to be adjusted more.

在一些實施例中，對應初始影片V1、V2的座標資訊可包括指定點座標資訊及邊長資訊。其中，指定點座標資訊是定義初始影片V1、V2的版面位置的指定點在顯示版面中的位置。其中，初始影片V1、V2的指定點座標可依實際需求設定為版面位置上的任一點。邊長資訊是定義初始影片V1、V2的版面位置的邊長或尺寸。In some embodiments, the coordinate information corresponding to the initial videos V1 and V2 may include specified point coordinate information and edge length information. Wherein, the specified point coordinate information is the position of the specified point defining the layout positions of the initial videos V1 and V2 in the display layout. Wherein, the designated point coordinates of the initial videos V1 and V2 can be set as any point on the layout according to actual needs. The side length information is the side length or size that defines the layout positions of the original videos V1 and V2.

在一些實施例中，假設初始影片V1、V2的版面位置為矩形，則邊長資訊可為寬度資訊W1、W2及長度資訊L3、L4。其中，寬度資訊W1、W2與長度資訊L3、L4分別是定義初始影片V1、V2於對應的版面位置的二邊長（如圖4所示）。In some embodiments, assuming that the layout positions of the original videos V1 and V2 are rectangles, the side length information may be width information W1 and W2 and length information L3 and L4. Wherein, the width information W1 , W2 and length information L3 , L4 respectively define the lengths of two sides of the initial video V1 , V2 at the corresponding layout positions (as shown in FIG. 4 ).

在一實施例中，指定點可為原點。以初始影片V1、V2的版面位置為矩形為例，對應初始影片V1、V2的座標資訊可包括原點座標資訊O1、O2、寬度資訊W1、W2及長度資訊L3、L4。其中，初始影片V1、V2的版面位置的原點可為初始影片V1、V2的版面位置的邊緣上的任一點。較佳地，初始影片V1、V2的版面位置的原點可為該版面位置的頂點。於此，寬度資訊W1、W2與長度資訊L3、L4可分別是定義初始影片V1、V2的版面位置之連接原點的二邊長。In one embodiment, the designated point may be the origin. Taking the layout positions of the initial videos V1 and V2 as rectangles as an example, the coordinate information corresponding to the initial videos V1 and V2 may include origin coordinate information O1 and O2, width information W1 and W2, and length information L3 and L4. Wherein, the origin of the layout positions of the initial films V1 and V2 may be any point on the edge of the layout positions of the initial films V1 and V2. Preferably, the origin of the layout positions of the initial videos V1 and V2 may be the apex of the layout positions. Here, the width information W1 , W2 and the length information L3 , L4 may respectively be the lengths of two sides defining the layout positions of the original videos V1 , V2 connecting the origin.

在另一實施例中，指定點可為中心點。以初始影片V1、V2的版面位置為矩形為例，對應初始影片V1、V2的座標資訊可包括中心點座標資訊P1、P2、寬度資訊W1、W2及長度資訊L3、L4。中心點座標資訊P1、P2是定義初始影片V1、V2的版面位置的中心點在顯示版面中的位置。In another embodiment, the designated point may be a center point. Taking the layout positions of the initial videos V1 and V2 as rectangles as an example, the coordinate information corresponding to the initial videos V1 and V2 may include center point coordinate information P1 and P2, width information W1 and W2, and length information L3 and L4. The center point coordinate information P1, P2 is the position of the center point defining the layout position of the initial video V1, V2 in the display layout.

在一些實施例中，當顯示畫面的版面配置是以二維配置（如圖4及圖5所示）時，指定點座標資訊及預設參考點L2個別可包括縱向資料與橫向資料。在一些實施例中，縱向資料與橫向資料可分別為X座標值與Y座標值。In some embodiments, when the layout of the display screen is two-dimensional (as shown in FIG. 4 and FIG. 5 ), the coordinate information of the specified point and the default reference point L2 may respectively include vertical data and horizontal data. In some embodiments, the longitudinal data and the horizontal data may be X coordinate values and Y coordinate values respectively.

在一些實施例中，如圖4及圖5所示，以顯示畫面的版面配置位於X-Y平面上為例。在版面配置資訊L1中，對應初始影片V1的座標資訊可包含初始影片V1的版面位置於X-Y平面上的原點座標資訊O1、相對原點座標資訊O1的寬度資訊W1以及長度資訊L3，且對應初始影片V2的座標資訊可包含初始影片V2的版面位置中於X-Y平面上的原點座標資訊O2、相對原點座標資訊O2的寬度資訊W2以及長度資訊L4。舉例來說，參照圖4，以顯示畫面的版面配置為將整個版面對分為左方版面位置31及右方版面位置32為例。原點座標資訊O1與原點座標資訊O2可分別為左方版面位置31的頂點（即左方版面位置31的原點）的座標與右方版面位置32的頂點（即右方版面位置32的原點）的座標。原點座標資訊O2位於原點座標資訊O1的右方。寬度資訊W1與長度資訊L3分別為左方版面位置31上原點連接的二邊長。寬度資訊W2與長度資訊L4分別為右方版面位置32上原點連接的二邊長。以顯示畫面的版面配置為將整個版面對分為為上方版面位置33及下方版面位置34為例。原點座標資訊O1與原點座標資訊O2可分別為上方版面位置33的頂點（即上方版面位置33的原點）的座標與下方版面位置34的頂點（即下方版面位置34的原點）的座標，並且原點座標資訊O2位於原點座標資訊O1的下方。寬度資訊W1與長度資訊L3分別為上方版面位置33上原點連接的二邊長。寬度資訊W2與長度資訊L4分別為下方版面位置34上原點連接的二邊長。應能明瞭的是，於此雖然以各子畫面的顯示區塊的頂點作為此版面位置的原點為例，但不限於此，實際上可任意選用各子畫面的顯示區塊的相同相對位置（即任一點）作為此版面位置的原點。較佳地，各版面位置的原點可為顯示區塊的任一頂點。In some embodiments, as shown in FIG. 4 and FIG. 5 , it is taken as an example that the layout of the display screen is located on the X-Y plane. In the layout information L1, the coordinate information corresponding to the initial video V1 may include the origin coordinate information O1 of the layout position of the initial video V1 on the X-Y plane, the width information W1 and the length information L3 relative to the origin coordinate information O1, and correspond to The coordinate information of the initial video V2 may include origin coordinate information O2 on the X-Y plane in the layout position of the initial video V2, width information W2 and length information L4 relative to the origin coordinate information O2. For example, referring to FIG. 4 , it is taken that the layout of the display screen is divided into a left layout position 31 and a right layout position 32 as an example. The origin coordinate information O1 and the origin coordinate information O2 can be the coordinates of the vertex of the left layout position 31 (ie, the origin of the left layout position 31) and the vertex of the right layout position 32 (ie, the vertex of the right layout position 32). origin) coordinates. The origin coordinate information O2 is located to the right of the origin coordinate information O1. The width information W1 and the length information L3 are respectively the lengths of two sides connected to the origin on the left layout position 31 . The width information W2 and the length information L4 are respectively the lengths of two sides connected to the origin on the right layout position 32 . Taking the layout configuration of the display screen as an example, the entire layout pair is divided into an upper layout position 33 and a lower layout position 34 . The origin coordinate information O1 and the origin coordinate information O2 can be the coordinates of the vertex of the upper layout position 33 (ie the origin of the upper layout position 33) and the vertex of the lower layout position 34 (ie the origin of the lower layout position 34) respectively. coordinates, and the origin coordinate information O2 is located below the origin coordinate information O1. The width information W1 and the length information L3 are respectively the lengths of two sides connected to the origin on the upper layout position 33 . The width information W2 and the length information L4 are respectively the lengths of two sides connected to the origin on the lower layout position 34 . It should be clear that although the apex of the display block of each sub-picture is used as the origin of the layout position here as an example, it is not limited to this, and the same relative position of the display block of each sub-picture can be arbitrarily selected in fact. (that is, any point) as the origin of this layout position. Preferably, the origin of each layout position can be any vertex of the display block.

在步驟S03的一些實施例中，聲音處理單元12可根據初始影片V1、V2的原點座標資訊O1、O2、寬度資訊W1、W2與長度資訊L3、L4計算初始影片V1、V2的版面位置的中心點座標資訊P1、P2。接著，聲音處理單元12根據中心點座標資訊P1、P2與預設參考點L2的座標資訊計算初始影片V1、V2的版面位置與預設參考點L2之間的相對距離（即中心點與預設參考點L2之間的絕對距離）。然後，聲音處理單元12再根據前述得到的相對距離決定此初始影片V1、V2的音訊調整值，並且利用決定的音訊調整值調整對應的聲音訊號A1、A2而生成立體聲音訊號S1、S2。In some embodiments of step S03, the sound processing unit 12 can calculate the layout position of the initial video V1, V2 according to the origin coordinate information O1, O2, width information W1, W2 and length information L3, L4 of the initial video V1, V2 Center point coordinate information P1, P2. Next, the sound processing unit 12 calculates the relative distance between the layout position of the original video V1, V2 and the preset reference point L2 according to the center point coordinate information P1, P2 and the coordinate information of the preset reference point L2 (that is, the center point and the preset reference point L2 absolute distance between reference points L2). Then, the sound processing unit 12 determines the audio adjustment values of the initial video V1, V2 according to the obtained relative distance, and uses the determined audio adjustment values to adjust the corresponding audio signals A1, A2 to generate stereo audio signals S1, S2.

在一實施例中，聲音處理單元12可將原點座標資訊O1、寬度資訊W1及長度資訊L3代入中心點演算式以計算出初始影片V1的版面位置的中心點座標資訊P1（如，圖4所示的左方版面位置31的中心點的X座標值與Y座標值）。接著，聲音處理單元12將中心點座標資訊P1與預設參考點L2的座標資訊代入距離演算式以計算初始影片V1的版面位置與預設參考點L2之間於X-Y平面上的相對距離（以下稱為第一距離）。然後，聲音處理單元12再根據第一距離決定第一音訊調整值。同理，聲音處理單元12可將原點座標資訊O2、寬度資訊W2及長度資訊L4代入中心點演算式以計算出於初始影片V2的版面位置之中心點座標資訊P2（如，圖4所示的右方版面位置32的中心點的X座標值與Y座標值）。接著，聲音處理單元12再將中心點座標資訊P2與預設參考點L2的座標資訊代入距離演算式，以計算初始影片V2的版面位置與預設參考點L2之間於X-Y平面上的相對距離（以下稱為第二距離）。然後，聲音處理單元12再根據第二距離決定第二音訊調整值。In one embodiment, the sound processing unit 12 can substitute the origin coordinate information O1, width information W1 and length information L3 into the center point calculation formula to calculate the center point coordinate information P1 of the layout position of the initial video V1 (eg, FIG. 4 The X coordinate value and the Y coordinate value of the center point of the left layout position 31 shown). Next, the sound processing unit 12 substitutes the coordinate information of the central point P1 and the coordinate information of the preset reference point L2 into the distance calculation formula to calculate the relative distance between the layout position of the initial video V1 and the preset reference point L2 on the X-Y plane (hereinafter called the first distance). Then, the sound processing unit 12 determines a first audio adjustment value according to the first distance. Similarly, the sound processing unit 12 can substitute the origin coordinate information O2, width information W2 and length information L4 into the center point calculation formula to calculate the center point coordinate information P2 of the layout position of the original video V2 (as shown in FIG. 4 The X coordinate value and the Y coordinate value of the center point of the right layout position 32 of . Next, the sound processing unit 12 substitutes the center point coordinate information P2 and the coordinate information of the preset reference point L2 into the distance calculation formula to calculate the relative distance on the X-Y plane between the layout position of the initial video V2 and the preset reference point L2 (hereinafter referred to as the second distance). Then, the sound processing unit 12 determines a second audio adjustment value according to the second distance.

當座標資訊中已存在中心點座標資訊P1、P2時，在步驟S03的一些實施例中，聲音處理單元12可直接根據中心點座標資訊P1、P2與預設參考點L2的座標資訊計算初始影片V1、V2的版面位置與預設參考點L2之間的相對距離。然後，聲音處理單元12再根據前述得到的相對距離決定此初始影片V1、V2的音訊調整值，並且利用決定的音訊調整值調整對應的聲音訊號A1、A2而生成立體聲音訊號S1（S2）。When the center point coordinate information P1, P2 already exists in the coordinate information, in some embodiments of step S03, the sound processing unit 12 can directly calculate the initial video according to the center point coordinate information P1, P2 and the coordinate information of the preset reference point L2 The relative distance between the layout positions of V1 and V2 and the preset reference point L2. Then, the sound processing unit 12 determines the audio adjustment values of the initial video V1 and V2 according to the obtained relative distance, and uses the determined audio adjustment values to adjust the corresponding audio signals A1 and A2 to generate a stereo audio signal S1 (S2).

在一些實施例中，聲音處理單元12可不需實際計算出各初始影片V1、V2的版面位置與預設參考點L2的相對距離，而是透過比對各初始影片V1、V2的配置位置Lo1、Lo2與預設參考點L2來取得預存之音訊調整值。In some embodiments, the sound processing unit 12 does not need to actually calculate the relative distance between the layout position of each initial video V1, V2 and the preset reference point L2, but by comparing the configuration positions Lo1, Lo1, Lo2 and preset reference point L2 to obtain the pre-stored audio adjustment value.

在步驟S03的一些實施例中，聲音處理單元12可比對初始影片V1、V2的配置位置Lo1、Lo2與預設參考點L2的座標資訊（如，比較原點座標資訊O1、O2與預設參考點L2的座標資訊），以判斷初始影片V1、V2的版面位置為第一位置或為第二位置，並取得預存之音訊調整值中對應判斷結果的音訊調整值，進而以取得的音訊調整值調整對應的聲音訊號A1、A2而生成立體聲音訊號S1、S2。In some embodiments of step S03, the sound processing unit 12 can compare the coordinate information of the configuration positions Lo1, Lo2 of the initial video V1, V2 with the preset reference point L2 (for example, comparing the origin coordinate information O1, O2 with the preset reference Coordinate information of point L2) to determine whether the layout position of the initial video V1 and V2 is the first position or the second position, and obtain the audio adjustment value corresponding to the judgment result in the pre-stored audio adjustment value, and then use the obtained audio adjustment value The corresponding audio signals A1, A2 are adjusted to generate stereo audio signals S1, S2.

以二初始影片V1、V2為例，當聲音處理單元12判斷出初始影片V1的版面位置為第一位置且初始影片V2的版面位置為第二位置時，聲音處理單元12可根據第一位置與音訊調整值之間的預設對應關係決定第一音訊調整值，並根據第二位置與音訊調整值之間的預設對應關係決定第二音訊調整值。舉例來說，儲存單元儲存有各種版面位置與其對應的音訊調整值的對應表。聲音處理單元12以第一位置與第二位置搜尋對應表中相同之版面位置。然後，聲音處理單元12從儲存單元讀出與第一位置相同之版面位置所對應的音訊調整值以作為第一音訊調整值，以及從儲存單元讀出與第二位置相同之版面位置所對應的音訊調整值以作為第二音訊調整值。反之，當聲音處理單元12判斷出初始影片V1的版面位置為第二位置且初始影片V2的版面位置為第一位置時，聲音處理單元12可根據第二位置與音訊調整值之間的預設對應關係決定第一音訊調整值，並根據第一位置與音訊調整值之間的預設對應關係決定第二音訊調整值。如此一來，聲音處理單元12可節省前述實際計算相對距離值之運算量。Taking the two initial films V1 and V2 as examples, when the sound processing unit 12 judges that the layout position of the initial film V1 is the first position and the layout position of the initial film V2 is the second position, the sound processing unit 12 may base on the first position and the second position. The first audio adjustment value is determined by the default correspondence between the audio adjustment values, and the second audio adjustment value is determined according to the default correspondence between the second position and the audio adjustment value. For example, the storage unit stores a correspondence table of various layout positions and corresponding audio adjustment values. The sound processing unit 12 uses the first position and the second position to search for the same layout position in the corresponding table. Then, the sound processing unit 12 reads the audio adjustment value corresponding to the same layout position as the first position from the storage unit as the first audio adjustment value, and reads the audio adjustment value corresponding to the same layout position as the second position from the storage unit. The audio adjustment value is used as the second audio adjustment value. Conversely, when the audio processing unit 12 determines that the layout position of the initial video V1 is the second position and the layout position of the initial video V2 is the first position, the audio processing unit 12 may adjust the value according to the preset value between the second position and the audio. The correspondence determines the first audio adjustment value, and determines the second audio adjustment value according to the preset correspondence between the first position and the audio adjustment value. In this way, the sound processing unit 12 can save the amount of computation for actually calculating the relative distance value.

舉例來說，以第一位置為圖4所示之左方版面位置31且第二位置為圖4所示之右方版面位置32為例。聲音處理單元12比對初始影片V1、V2對應的原點座標資訊O1、O2的橫向資料與預設參考點L2的座標資訊的橫向資料。當原點座標資訊O1、O2的橫向資料大於預設參考點L2的座標資訊的橫向資料時，聲音處理單元12判定初始影片V1、V2的版面位置位在相對於預設參考點L2的右方，即確定初始影片V1、V2的版面位置為右方版面位置32。反之，當原點座標資訊O1、O2的橫向資料不大於預設參考點L2的座標資訊的橫向資料時，聲音處理單元12判定初始影片V1、V2的版面位置位在相對於預設參考點L2的左方，即確定初始影片V1、V2的版面位置為左方版面位置31。For example, take the first position as the left layout position 31 shown in FIG. 4 and the second position as the right layout position 32 shown in FIG. 4 as an example. The sound processing unit 12 compares the horizontal data of the origin coordinate information O1 and O2 corresponding to the initial videos V1 and V2 with the horizontal data of the coordinate information of the preset reference point L2. When the horizontal data of the origin coordinate information O1, O2 is greater than the horizontal data of the coordinate information of the preset reference point L2, the sound processing unit 12 determines that the layout position of the initial video V1, V2 is on the right relative to the preset reference point L2 , that is, determine the layout positions of the initial films V1 and V2 as the layout position 32 on the right. Conversely, when the horizontal data of the origin coordinate information O1, O2 is not greater than the horizontal data of the coordinate information of the preset reference point L2, the sound processing unit 12 determines that the layout position of the initial video V1, V2 is located relative to the preset reference point L2 , that is, determine the layout positions of the initial films V1 and V2 as the left layout position 31.

在另一範例中，以第一位置為圖5所示之上方版面位置33且第二位置為圖5所示之下方版面位置34為例。聲音處理單元12比對初始影片V1、V2對應的原點座標資訊O1、O2的縱向資料與預設參考點L2的座標資訊的縱向資料。當原點座標資訊O1的縱向資料大於預設參考點L2的座標資訊的縱向資料時，聲音處理單元12判定初始影片V1、的版面位置位在相對於預設參考點L2的上方，即確定初始影片V1、的版面位置為上方版面位置33。反之，當原點座標資訊O2的縱向資料不大於預設參考點L2的座標資訊的縱向資料時，聲音處理單元12判定初始影片V2的版面位置位在相對於預設參考點L2的下方，即確定初始影片V2的版面位置為下方版面位置34。In another example, take the first position as the upper layout position 33 shown in FIG. 5 and the second position as the lower layout position 34 shown in FIG. 5 as an example. The sound processing unit 12 compares the longitudinal data of the origin coordinate information O1 and O2 corresponding to the initial videos V1 and V2 with the longitudinal data of the coordinate information of the preset reference point L2. When the vertical data of the origin coordinate information O1 is greater than the vertical data of the coordinate information of the preset reference point L2, the sound processing unit 12 determines that the layout position of the initial video V1 is above the preset reference point L2, that is, determines the initial The layout position of the movie V1' is the upper layout position 33 . Conversely, when the vertical data of the origin coordinate information O2 is not greater than the vertical data of the coordinate information of the preset reference point L2, the sound processing unit 12 determines that the layout position of the initial video V2 is below the preset reference point L2, that is, Determine the layout position of the initial movie V2 as the lower layout position 34 .

在一些實施例中，各音訊調整值包括對應的聲音訊號A1（或A2）的至少一訊號參數的調整資訊。其中，調整資訊可例如為音量資訊、延遲資訊、頻率資訊、或相位資訊等。音量資訊是用以調整聲音訊號A1（或A2）的振幅大小。延遲資訊是用以調整聲音訊號A1（或A2）的延遲時間。頻率資訊是用以調整聲音訊號A1（或A2）的頻率成分。相位資訊是用以調整聲音訊號A1（或A2）的相位。In some embodiments, each audio adjustment value includes adjustment information of at least one signal parameter of the corresponding audio signal A1 (or A2 ). Wherein, the adjustment information may be volume information, delay information, frequency information, or phase information, for example. The volume information is used to adjust the amplitude of the audio signal A1 (or A2). The delay information is used to adjust the delay time of the audio signal A1 (or A2). The frequency information is used to adjust the frequency components of the audio signal A1 (or A2). The phase information is used to adjust the phase of the audio signal A1 (or A2).

在一些實施例中，以影音播放裝置3包含至少兩聲道（例如，左聲道及右聲道）且前述之初始影片V1的版面位置為左方版面位置31及初始影片V2的版面位置為右方版面位置32為例。為建立相對於X軸方向具有立體聲音音效的立體聲音訊號S1，前述之第一音訊調整值可包含音量資訊及/或延遲（delay）資訊。在步驟S03的實施例中，聲音處理單元12可根據左方版面位置31的座標資訊（即，原點座標資訊O1、長度資訊L3及寬度資訊W1）決定第一音訊調整值的音量資訊及/或延遲資訊。接著，聲音處理單元12可根據第一音訊調整值的音量資訊將初始影片V1的聲音訊號A1的音量大小進行衰減（如根據音量資訊將聲音訊號A1的振幅降低特定值或特定比例），且根據第一音訊調整值的延遲資訊將聲音訊號A1加上延遲時間，藉以生成包含第一訊號分量與第二訊號分量的立體聲音訊號S1。其中，經衰減且延遲後的聲音訊號A1成為立體聲音訊號S1的第一訊號分量，並且未經衰減且未經延遲的聲音訊號A1可成為立體聲音訊號S1的第二訊號分量。然後，聲音處理單元12再發送包含第一訊號分量與第二訊號分量的立體聲音訊號S1至影音播放裝置3。影音播放裝置3以其右聲道播放立體聲音訊號S1的第一訊號分量，且影音播放裝置3以其左聲道播放立體聲音訊號S1的第二訊號分量。In some embodiments, the audio-visual playback device 3 includes at least two audio channels (for example, a left channel and a right channel) and the layout position of the aforementioned initial video V1 is the left layout position 31 and the layout position of the initial video V2 is Take position 32 on the right as an example. In order to create the stereo audio signal S1 with stereo audio effects relative to the X-axis direction, the aforementioned first audio adjustment value may include volume information and/or delay information. In the embodiment of step S03, the sound processing unit 12 can determine the volume information and/or the first audio adjustment value according to the coordinate information of the left layout position 31 (ie, the origin coordinate information O1, the length information L3 and the width information W1). or delayed information. Then, the audio processing unit 12 can attenuate the volume of the audio signal A1 of the initial video V1 according to the volume information of the first audio adjustment value (such as reducing the amplitude of the audio signal A1 by a specific value or a specific ratio according to the volume information), and according to The delay information of the first audio adjustment value adds a delay time to the audio signal A1 to generate a stereo audio signal S1 including the first signal component and the second signal component. Wherein, the attenuated and delayed audio signal A1 becomes the first signal component of the stereo audio signal S1, and the non-attenuated and non-delayed audio signal A1 can become the second signal component of the stereo audio signal S1. Then, the audio processing unit 12 sends the stereo audio signal S1 including the first signal component and the second signal component to the audio-visual playback device 3 . The video playback device 3 plays the first signal component of the stereo audio signal S1 through its right channel, and the video playback device 3 plays the second signal component of the stereo audio signal S1 through its left channel.

同理，為建立相對於X軸方向具有立體聲音音效的立體聲音訊號S2，第二音訊調整值可包含音量資訊及/或延遲資訊。在步驟S03的一實施例中，聲音處理單元12可根據右方版面位置32的座標資訊（即，原點座標資訊O2、長度資訊L4及寬度資訊W2）決定第二音訊調整值的音量資訊及/或延遲資訊，聲音處理單元12可根據第二音訊調整值的音量資訊將初始影片V2的聲音訊號A2的音量大小進行衰減，且聲音處理單元12可進一步根據第二音訊調整值的延遲資訊將聲音訊號A2加上延遲時間，藉以生成包含第一訊號分量與第二訊號分量的立體聲音訊號S2。其中，經衰減且延遲後的聲音訊號A2成為立體聲音訊號S2的第一訊號分量，未經衰減且未經延遲的聲音訊號A2可成為立體聲音訊號S2的第二訊號分量。然後，聲音處理單元12再發送包含第一訊號分量與第二訊號分量的立體聲音訊號S2至影音播放裝置3。影音播放裝置3以其左聲道播放立體聲音訊號S2的第一訊號分量，且影音播放裝置3以其右聲道播放立體聲音訊號S2的第二訊號分量。Similarly, in order to create the stereo audio signal S2 with stereo audio effects relative to the X-axis direction, the second audio adjustment value may include volume information and/or delay information. In an embodiment of step S03, the sound processing unit 12 can determine the volume information and /or delay information, the sound processing unit 12 can attenuate the volume of the sound signal A2 of the initial video V2 according to the volume information of the second audio adjustment value, and the sound processing unit 12 can further reduce the volume of the sound signal A2 according to the second audio adjustment value. A delay time is added to the audio signal A2 to generate a stereo audio signal S2 including the first signal component and the second signal component. Wherein, the attenuated and delayed audio signal A2 becomes the first signal component of the stereo audio signal S2, and the unattenuated and undelayed audio signal A2 can become the second signal component of the stereo audio signal S2. Then, the audio processing unit 12 sends the stereo audio signal S2 including the first signal component and the second signal component to the audio-visual playback device 3 . The video playback device 3 plays the first signal component of the stereo audio signal S2 through its left channel, and the video playback device 3 plays the second signal component of the stereo audio signal S2 through its right channel.

於是，影音播放裝置3之使用者可感受到初始影片V1的主要聲音來自左方，且感受到初始影片V2的主要聲音來自右方。也就是說，初始影片V1的立體聲音訊號S1可初始影片V1於影音播放裝置3的版面位置（即，左方版面位置31），且初始影片V2的立體聲音訊號S2可對應初始影片V2於影音播放裝置3的版面位置（即，右方版面位置32）。Therefore, the user of the audio-visual playing device 3 can feel that the main sound of the original video V1 is from the left, and that the main sound of the original video V2 is from the right. That is to say, the stereo audio signal S1 of the initial video V1 can be placed in the layout position of the audio-visual playback device 3 (ie, the left panel position 31) of the original video V1, and the stereo audio signal S2 of the initial video V2 can correspond to the audio-visual position of the original video V2. The layout position of the playback device 3 (ie, the right layout position 32).

換句話說，聲音處理單元12還可根據各初始影片V1、V2對應的指定點相對預設參考點L2的左右方位調整此初始影片V1、V2的聲音訊號A1、A2的音量（即振幅）及/或延遲時間。In other words, the sound processing unit 12 can also adjust the volume (i.e. the amplitude) and / or delay time.

在一些實施例中，以前述之影音播放裝置3包含至少兩聲道（以下稱為第一聲道及第二聲道）且初始影片V1的版面位置為上方版面位置33及初始影片V2的版面位置為下方版面位置34為例，為建立相對於Y軸方向具有立體聲音音效的立體聲音訊號S1，前述之第一音訊調整值可包含音量資訊及/或頻率資訊。在步驟S03的實施例中，聲音處理單元12可根據上方版面位置33的座標資訊（如，原點座標資訊O1、長度資訊L3及寬度資訊W1）決定第一音訊調整值的音量資訊及/或頻率資訊，接著，聲音處理單元12可根據第一音訊調整值的音量資訊將初始影片V1的聲音訊號A1的音量大小進行衰減，且聲音處理單元12可進一步藉由等化（EQ）手段根據第一音訊調整值的頻率資訊調整聲音訊號A1的頻率成分（如。聲音處理單元12可壓低聲音訊號A1的高頻成分），藉以生成包含第一訊號分量與第二訊號分量的立體聲音訊號S1。於此，經衰減且調整頻率成分後的聲音訊號A1成為立體聲音訊號S1的第一訊號分量，並且未經衰減且未調整頻率成分的聲音訊號A1可成為立體聲音訊號S1的第二訊號分量。然後，聲音處理單元12再發送包含第一訊號分量與第二訊號分量的立體聲音訊號S1至影音播放裝置3。影音播放裝置3以其第一聲道播放立體聲音訊號S1的第一訊號分量，且影音播放裝置3以其第二聲道播放立體聲音訊號S1的第二訊號分量。In some embodiments, the aforementioned audio-visual playback device 3 includes at least two audio channels (hereinafter referred to as the first audio channel and the second audio channel) and the layout position of the initial video V1 is the upper layout position 33 and the layout of the initial video V2 The position is the lower page position 34 as an example. In order to create a stereo sound signal S1 with stereo sound effects relative to the Y-axis direction, the aforementioned first audio adjustment value may include volume information and/or frequency information. In the embodiment of step S03, the sound processing unit 12 may determine the volume information and/or Frequency information, then, the sound processing unit 12 can attenuate the volume of the sound signal A1 of the initial video V1 according to the volume information of the first audio adjustment value, and the sound processing unit 12 can further use the equalization (EQ) method according to the first The frequency information of an audio adjustment value adjusts the frequency components of the audio signal A1 (for example, the audio processing unit 12 can reduce the high frequency components of the audio signal A1), so as to generate the stereo audio signal S1 including the first signal component and the second signal component. Here, the attenuated audio signal A1 with adjusted frequency components becomes the first signal component of the stereo audio signal S1, and the unattenuated audio signal A1 without adjusted frequency components becomes the second signal component of the stereo audio signal S1. Then, the audio processing unit 12 sends the stereo audio signal S1 including the first signal component and the second signal component to the audio-visual playback device 3 . The video playback device 3 plays the first signal component of the stereo audio signal S1 through its first channel, and the video playback device 3 plays the second signal component of the stereo audio signal S1 through its second channel.

同理，為建立相對於Y軸方向具有立體聲音音效的立體聲音訊號S2，第二音訊調整值可包含音量資訊及/或頻率資訊。在步驟S03的實施例中，聲音處理單元12可根據下方版面位置34的座標資訊（如，原點座標資訊O2、長度資訊L4及寬度資訊W2）決定第二音訊調整值的音量資訊及/或頻率資訊，接著，聲音處理單元12可根據第二音訊調整值的音量資訊將初始影片V2的聲音訊號A2的音量大小進行衰減，且聲音處理單元12可進一步藉由等化手段根據第二音訊調整值的頻率資訊調整聲音訊號A2的頻率成分，藉以生成包含第一訊號分量與第二訊號分量的立體聲音訊號S2。其中，經衰減且調整頻率成分後的聲音訊號A2成為立體聲音訊號S2的第一訊號分量，以及未經衰減且未調整頻率成分的聲音訊號A2可成為立體聲音訊號S2的第二訊號分量。聲音處理單元12再發送包含第一訊號分量與第二訊號分量的立體聲音訊號S2至影音播放裝置3。影音播放裝置3以其第二聲道播放立體聲音訊號S2的第一訊號分量，且影音播放裝置3以其第一聲道播放立體聲音訊號S2的第二訊號分量。Similarly, in order to create the stereo audio signal S2 with stereo audio effects relative to the Y-axis direction, the second audio adjustment value may include volume information and/or frequency information. In the embodiment of step S03, the sound processing unit 12 can determine the volume information and/or Frequency information, then, the sound processing unit 12 can attenuate the volume of the sound signal A2 of the initial video V2 according to the volume information of the second audio adjustment value, and the sound processing unit 12 can further adjust the volume according to the second audio by means of equalization The frequency information of the value adjusts the frequency components of the audio signal A2, so as to generate the stereo audio signal S2 including the first signal component and the second signal component. The attenuated audio signal A2 with adjusted frequency components becomes the first signal component of the stereo audio signal S2, and the unattenuated audio signal A2 without frequency adjustment becomes the second signal component of the stereo audio signal S2. The audio processing unit 12 then sends the stereo audio signal S2 including the first signal component and the second signal component to the audio-visual playback device 3 . The audio-visual playback device 3 plays the first signal component of the stereo audio signal S2 through its second channel, and the audio-visual playback device 3 plays the second signal component of the stereo audio signal S2 through its first channel.

於是，影音播放裝置3之使用者可感受到初始影片V1的主要聲音來自上方，且感受到初始影片V2的主要聲音來自下方。也就是說，初始影片V1的立體聲音訊號S1可對應初始影片V1於影音播放裝置3的版面位置（即，上方版面位置33）且初始影片V2的立體聲音訊號S2可對應初始影片V2於影音播放裝置3的版面位置（即，下方版面位置34）。Therefore, the user of the audio-visual playing device 3 can feel that the main sound of the original video V1 is from above, and that the main sound of the original video V2 is from below. That is to say, the stereo sound signal S1 of the initial video V1 can correspond to the layout position of the initial video V1 in the audio-visual playback device 3 (ie, the upper layout position 33 ) and the stereo audio signal S2 of the initial video V2 can correspond to the audio-visual playback of the initial video V2 Layout position of device 3 (ie, lower layout position 34).

換句話說，聲音處理單元12還可根據各初始影片V1、V2對應的指定點相對預設參考點L2的上下方位調整此初始影片V1（V2）的聲音訊號A1（A2）的音量及/或頻率成分。In other words, the sound processing unit 12 can also adjust the volume and/or volume of the audio signal A1 (A2) of the initial video V1 (V2) according to the vertical position of the designated point corresponding to each initial video V1, V2 relative to the preset reference point L2. frequency components.

在一些實施例中，當顯示畫面的版面配置是以三維配置（如圖6所示）時，指定點座標資訊及預設參考點L2的座標資訊個別可包括縱向資料、橫向資料與深度資料。在一些實施例中，縱向資料、橫向資料與深度資料可分別為X座標值、Y座標值與Z座標值。In some embodiments, when the layout of the display screen is in three dimensions (as shown in FIG. 6 ), the coordinate information of the specified point and the coordinate information of the default reference point L2 may include vertical data, horizontal data and depth data respectively. In some embodiments, the vertical data, horizontal data and depth data may be X coordinate values, Y coordinate values and Z coordinate values respectively.

在步驟S03的一些實施例中，聲音處理單元12還可根據各初始影片V1、V2對應的指定點座標資訊中的深度資料及預設參考點L2的座標資訊中的深度資料計算出深度距離，並根據計算出的深度距離決定此初始影片V1、V2的聲音訊號A1、A2的音訊調整值中的相位資訊。在另一些實施例中，聲音處理單元12則可透過比對初始影片V1、V2對應的指定點座標資訊與預設參考點L2的座標資訊來判斷初始影片V1、V2的版面位置，進而從預設的多個音訊調整值（其含相位資訊）取得對應判斷結果的音訊調整值（其含相位資訊）。In some embodiments of step S03, the sound processing unit 12 can also calculate the depth distance according to the depth data in the coordinate information of the designated point corresponding to each initial video V1, V2 and the depth data in the coordinate information of the preset reference point L2, And determine the phase information in the audio adjustment value of the audio signals A1, A2 of the initial video V1, V2 according to the calculated depth distance. In other embodiments, the sound processing unit 12 can determine the layout position of the initial video V1, V2 by comparing the coordinate information of the designated point corresponding to the initial video V1, V2 with the coordinate information of the preset reference point L2, and then from the preset The audio adjustment values (including phase information) corresponding to the judgment result are obtained from the set multiple audio adjustment values (including phase information).

在一些例中，以顯示畫面的版面配置位於X-Y-Z平面上（如圖6所示）且顯示畫面為子母畫面（如圖7所示）為例。請參照圖6及圖7，初始影片V1、V2的版面位置可分別為相對於預設參考L2於Z軸上具有不同深度距離D1、D2的前方版面位置35及後方版面位置36。初始影片V1的版面位置可為前方版面位置35，且初始影片V2的版面位置可為後方版面位置36。In some examples, the layout of the display screen is located on the X-Y-Z plane (as shown in FIG. 6 ) and the display screen is a picture-in-picture (as shown in FIG. 7 ) as an example. Referring to FIG. 6 and FIG. 7 , the layout positions of the initial videos V1 and V2 can be respectively the front layout position 35 and the rear layout position 36 with different depth distances D1 and D2 relative to the preset reference L2 on the Z axis. The layout position of the initial video V1 may be the front layout position 35 , and the layout position of the initial video V2 may be the rear layout position 36 .

在步驟S03的實施例中，聲音處理單元12還可根據前方版面位置35的中心點座標資訊P1中的深度資料及預設參考點L2的座標資訊中的深度資料計算出深度距離D1，並且聲音處理單元12再根據計算出的深度距離D1決定第一音訊調整值中的相位資訊（如，相位資訊可正比於深度距離D1）。同理，聲音處理單元12還可根據後方版面位置36的中心點座標資訊P2及預設參考點L2的座標資訊計算出深度距離D2，並且聲音處理單元12再根據計算出的深度距離D2決定第二音訊調整值中的相位資訊（如，相位資訊可正比於深度距離D2）。In the embodiment of step S03, the sound processing unit 12 can also calculate the depth distance D1 according to the depth data in the center point coordinate information P1 of the front layout position 35 and the depth data in the coordinate information of the preset reference point L2, and the sound The processing unit 12 then determines the phase information in the first audio adjustment value according to the calculated depth distance D1 (for example, the phase information may be proportional to the depth distance D1). Similarly, the sound processing unit 12 can also calculate the depth distance D2 according to the coordinate information P2 of the central point of the rear layout position 36 and the coordinate information of the preset reference point L2, and the sound processing unit 12 then determines the first depth distance D2 according to the calculated depth distance D2. Two phase information in the audio adjustment value (eg, the phase information may be proportional to the depth distance D2).

舉例來說，以前述之影音播放裝置3包含第一聲道及第二聲道且初始影片V1的版面位置為前方版面位置35及初始影片V2的版面位置為後方版面位置36為例。為建立相對於Z軸方向具有立體聲音音效的立體聲音訊號S1，前述之第一音訊調整值可包含音量資訊及相位資訊。在步驟S03的實施例中，聲音處理單元12可根據前方版面位置35的座標資訊（如，原點座標資訊O1、長度資訊L3及寬度資訊W1）決定第一音訊調整值的音量資訊及相位資訊。接著，聲音處理單元12可根據第一音訊調整值的音量資訊將初始影片V1的聲音訊號A1的音量大小進行衰減，且聲音處理單元12可進一步根據第一音訊調整值的相位資訊調整聲音訊號A1的相位（如，聲音處理單元12可以反相之方式調整聲音訊號A1的相位），藉以生成包含第一訊號分量與第二訊號分量的立體聲音訊號S1。經衰減且調整相位後的聲音訊號A1成為立體聲音訊號S1的第一訊號分量，以及未經衰減且未調整相位的聲音訊號A1可成為立體聲音訊號S1的第二訊號分量。然後，聲音處理單元12再發送包含第一訊號分量與第二訊號分量的立體聲音訊號S1至影音播放裝置3。影音播放裝置3以其第一聲道播放立體聲音訊號S1的第一訊號分量，且影音播放裝置3以其第二聲道播放立體聲音訊號S1的第二訊號分量。For example, take the aforementioned audio-visual playback device 3 including the first audio channel and the second audio channel and the layout position of the initial video V1 is the front layout position 35 and the layout position of the initial video V2 is the rear layout position 36 as an example. In order to create the stereo audio signal S1 with stereo audio effects relative to the Z-axis direction, the aforementioned first audio adjustment value may include volume information and phase information. In the embodiment of step S03, the sound processing unit 12 can determine the volume information and phase information of the first audio adjustment value according to the coordinate information of the front layout position 35 (such as origin coordinate information O1, length information L3 and width information W1) . Then, the audio processing unit 12 can attenuate the volume of the audio signal A1 of the initial video V1 according to the volume information of the first audio adjustment value, and the audio processing unit 12 can further adjust the audio signal A1 according to the phase information of the first audio adjustment value. (for example, the sound processing unit 12 can adjust the phase of the sound signal A1 in an anti-phase manner), so as to generate the stereo sound signal S1 including the first signal component and the second signal component. The attenuated and phase-adjusted audio signal A1 becomes the first signal component of the stereo audio signal S1, and the non-attenuated and non-phase-adjusted audio signal A1 becomes the second signal component of the stereo audio signal S1. Then, the audio processing unit 12 sends the stereo audio signal S1 including the first signal component and the second signal component to the audio-visual playback device 3 . The video playback device 3 plays the first signal component of the stereo audio signal S1 through its first channel, and the video playback device 3 plays the second signal component of the stereo audio signal S1 through its second channel.

同理，為建立相對於Z軸方向具有立體聲音音效的立體聲音訊號S2，前述之第二音訊調整值可包含音量資訊及相位資訊。在步驟S03的實施例中，聲音處理單元12可根據後方版面位置36的座標資訊（如，原點座標資訊O2、長度資訊L4及寬度資訊W2）決定第二音訊調整值的音量資訊及相位資訊。接著，聲音處理單元12可根據第二音訊調整值的音量資訊將初始影片V2的聲音訊號A2的音量大小進行衰減，且聲音處理單元12可進一步根據第二音訊調整值的相位資訊調整聲音訊號A2的相位（如，聲音處理單元12可以反相之方式調整聲音訊號A2的相位），藉以生成包含第一訊號分量與第二訊號分量的立體聲音訊號S2。其中，經衰減且調整相位後的聲音訊號A2成為立體聲音訊號S2的第一訊號分量，以及未經衰減且未調整相位的聲音訊號A2可成為立體聲音訊號S2的第二訊號分量。聲音處理單元12再發送包含第一訊號分量與第二訊號分量的立體聲音訊號S2至影音播放裝置3。影音播放裝置3以其第二聲道播放立體聲音訊號S2的第一訊號分量，且影音播放裝置3以其第一聲道播放立體聲音訊號S2的第二訊號分量。於是，影音播放裝置3之使用者可感受到位於不同深度的版面位置上的初始影片V1、V2的主要聲音具有不相同的距離感。Similarly, in order to create the stereo audio signal S2 with stereo audio effects relative to the Z-axis direction, the aforementioned second audio adjustment value may include volume information and phase information. In the embodiment of step S03, the sound processing unit 12 can determine the volume information and phase information of the second audio adjustment value according to the coordinate information of the rear layout position 36 (such as the origin coordinate information O2, the length information L4 and the width information W2). . Then, the audio processing unit 12 can attenuate the volume of the audio signal A2 of the initial video V2 according to the volume information of the second audio adjustment value, and the audio processing unit 12 can further adjust the audio signal A2 according to the phase information of the second audio adjustment value. (for example, the sound processing unit 12 can adjust the phase of the sound signal A2 in an anti-phase manner), so as to generate the stereo sound signal S2 including the first signal component and the second signal component. The attenuated and phase-adjusted audio signal A2 becomes the first signal component of the stereo audio signal S2, and the non-attenuated and non-phase-adjusted audio signal A2 becomes the second signal component of the stereo audio signal S2. The audio processing unit 12 then sends the stereo audio signal S2 including the first signal component and the second signal component to the audio-visual playback device 3 . The audio-visual playback device 3 plays the first signal component of the stereo audio signal S2 through its second channel, and the audio-visual playback device 3 plays the second signal component of the stereo audio signal S2 through its first channel. Therefore, the user of the audio-visual playback device 3 can feel that the main sounds of the initial videos V1 and V2 at different depths have different senses of distance.

在另一範例中，以第一位置為圖6所示之前方版面位置35、第二位置為圖6所示之後方版面位置36且前方版面位置35與後方版面位置36均位在預設參考點L2的後方為例。聲音處理單元12可比對初始影片V1、V2對應的原點座標資訊O1、O2的深度資料與預設參考點L2的座標資訊的深度資料來得知初始影片V1、V2的版面位置。當初始影片V1對應的深度資料小於預設參考點L2的座標資訊的深度資料和初始影片V2對應的深度資料時，聲音處理單元12判定初始影片V1的版面位置位在相對於預設參考點L2的最後方，即確定初始影片V1的版面位置為後方版面位置36，進而從預設之多個音訊調整值中取得對應後方版面位置36的音訊調整值。反之，當初始影片V1對應的深度資料小於預設參考點L2的座標資訊的深度資料但大於初始影片V2對應的深度資料時，聲音處理單元12判定初始影片V1的版面位置位在相對於預設參考點L2的後方且相對於初始影片V2的版面位置的前方，即確定初始影片V1的版面位置為前方版面位置35，進而從預設之多個音訊調整值中取得前方版面位置35對應的音訊調整值。In another example, the first position is the front layout position 35 shown in FIG. 6, the second position is the rear layout position 36 shown in FIG. Take the rear of point L2 as an example. The sound processing unit 12 can compare the depth data of the origin coordinate information O1, O2 corresponding to the initial video V1, V2 with the depth data of the coordinate information of the preset reference point L2 to obtain the layout positions of the initial video V1, V2. When the depth data corresponding to the initial video V1 is smaller than the depth data corresponding to the coordinate information of the preset reference point L2 and the depth data corresponding to the initial video V2, the sound processing unit 12 determines that the layout position of the initial video V1 is located relative to the preset reference point L2 The final position of the initial video V1 is determined as the rear layout position 36, and then the audio adjustment value corresponding to the rear layout position 36 is obtained from a plurality of preset audio adjustment values. Conversely, when the depth data corresponding to the initial video V1 is smaller than the depth data corresponding to the coordinate information of the preset reference point L2 but greater than the depth data corresponding to the initial video V2, the sound processing unit 12 determines that the layout position of the initial video V1 is located relative to the preset The rear of the reference point L2 and the front relative to the layout position of the initial video V2, that is, determine the layout position of the initial video V1 as the front layout position 35, and then obtain the audio corresponding to the front layout position 35 from a plurality of preset audio adjustment values Adjust the value.

換句話說，聲音處理單元12還可根據初始影片V1、V2對應的指定點相對於預設參考點L2的深度調整此初始影片V1、V2的聲音訊號A1、A2的音量及/或相位。In other words, the audio processing unit 12 can also adjust the volume and/or phase of the audio signals A1, A2 of the initial video V1, V2 according to the depth of the designated point corresponding to the initial video V1, V2 relative to the preset reference point L2.

在一實施例中，第一聲道可為左聲道，而第二聲道可為右聲道。在另一實施例中，第一聲道可為上方聲道，而第二聲道可為下方聲道。In one embodiment, the first audio channel may be the left channel and the second audio channel may be the right channel. In another embodiment, the first channel may be an upper channel and the second channel may be a lower channel.

於此，上述各實施例僅為範例說明如何將聲音訊號調整為立體聲音訊號，且其是針對音量、延遲、相位、頻率或其他等任意組合的聲音參數進行調整，以能使立體聲音訊號更符合使用現況。舉例來說，透過調整聲音訊號的音量、延遲與相位等二種、三種或更多聲音參數而得到的立體聲音訊號可以讓使用者聽到聲音來自於左前方、右前方、左後方、右後方、左上方、右上方、左下方或右下方等其中一方位，換言之，可視需要而調整一個或多個聲音參數而能讓聽者更有立體感。Here, the above-mentioned embodiments are only examples to illustrate how to adjust the audio signal to a stereo audio signal, and it is adjusted for any combination of audio parameters such as volume, delay, phase, frequency or others, so that the stereo audio signal can be more accurate. In line with the current situation of use. For example, the stereo sound signal obtained by adjusting two, three or more sound parameters such as the volume, delay and phase of the sound signal can allow the user to hear the sound coming from the left front, right front, left rear, right rear, Either upper left, upper right, lower left or lower right, etc. In other words, one or more sound parameters can be adjusted according to needs to make the listener more stereoscopic.

應能明瞭的是，在前述實施例中，雖然是以各影音擷取裝置2分別輸出初始影片V1（或V2）與聲音訊號A1（或A2）給影音處理裝置1進行說明，然不以此為限。換句話說，在一些實施例中，影音擷取裝置2可輸出單一影音串流給影音處理裝置1，再透過影音處理裝置1的影音分離單元（圖未示）對輸入的影音串流進行影片與聲音訊號的分離作業。其中，影音分離單元耦接在輸入單元10與影像處理單元11之間以及在輸入單元10與聲音處理單元12之間。It should be clear that, in the above-mentioned embodiments, although each audio-visual capture device 2 respectively outputs the initial video V1 (or V2) and the audio signal A1 (or A2) to the audio-visual processing device 1 for illustration, it does not limit. In other words, in some embodiments, the video capture device 2 can output a single video stream to the video processing device 1, and then perform video processing on the input video stream through the video separation unit (not shown) of the video processing device 1. Separation from audio signals. Wherein, the audio-video separation unit is coupled between the input unit 10 and the image processing unit 11 and between the input unit 10 and the audio processing unit 12 .

同樣地，在一些實施例中，影音處理裝置1可輸出單一影音串流給影音播放裝置3。於此，影音處理裝置1可藉由影音合成單元（圖未示）將對應於來源標記或配置位置Lo1、Lo2的初始影片V1、V2的輸出影片訊號Vo與整合聲音訊號So合併成單一影音串流，再經由輸出埠13中輸出給影音播放裝置3。於此，影音合成單元耦接在影像處理單元11與輸出埠13之間以及在聲音處理單元12與輸出埠13之間。Likewise, in some embodiments, the video processing device 1 can output a single video stream to the video playing device 3 . Here, the audio-visual processing device 1 can combine the output video signal Vo and the integrated audio signal So of the original video V1, V2 corresponding to the source mark or the configuration position Lo1, Lo2 into a single audio-visual string through the video-audio synthesis unit (not shown in the figure). The stream is then output to the audio-visual playback device 3 through the output port 13. Here, the audio-video synthesis unit is coupled between the image processing unit 11 and the output port 13 and between the audio processing unit 12 and the output port 13 .

在一些實施例中，前述之影像處理單元11與聲音處理單元12（以及影音合成單元及/或影音合成單元）可以一個或多個處理單元實現。其中，各處理單元可以是任何基於操作指令操作信號的類比和/或數位裝置等，但不限於此。In some embodiments, the aforementioned image processing unit 11 and audio processing unit 12 (and the audio-video synthesis unit and/or the video-audio synthesis unit) may be realized by one or more processing units. Wherein, each processing unit may be any analog and/or digital device based on an operation instruction to operate a signal, but is not limited thereto.

在一些實施例中，前述之儲存單元可耦接影像處理單元11或內建在影像處理單元11中。於此，儲存單元可以一個或多個儲存元件所實現。其中，儲存元件可以是例如記憶體或暫存器等，但不限於此。In some embodiments, the aforementioned storage unit may be coupled to the image processing unit 11 or built in the image processing unit 11 . Here, the storage unit can be realized by one or more storage elements. Wherein, the storage element may be, for example, a memory or a register, but is not limited thereto.

在一些實施例中，輸入單元10可由複數個連接埠或由一個或多個網路模組實現。各連接埠能藉由傳輸線耦接影音處理裝置1的外部裝置（即影音擷取裝置2）。於此，連接埠支援影音、影像或音訊傳輸。網路模組可符合任何無線通訊協定或有線通訊協定。In some embodiments, the input unit 10 can be realized by a plurality of connection ports or by one or more network modules. Each connection port can be coupled to an external device of the video processing device 1 (ie, the video capture device 2 ) via a transmission line. Here, the port supports video, video or audio transmission. The network module can conform to any wireless communication protocol or wired communication protocol.

在一些實施例中，輸出埠13可為實體影音連接器、或支援任何無線通訊協定之連接器。In some embodiments, the output port 13 can be a physical audio-visual connector, or a connector supporting any wireless communication protocol.

在一些實施例中，於影音處理裝置1耦接多個影音播放裝置3時，所有影音播放裝置3可從影音處理裝置1接收相同的檔案，即個別接收到相同的輸出影片訊號Vo與立體聲音訊號S1、S2（或So）。In some embodiments, when the audio-visual processing device 1 is coupled to multiple audio-visual playback devices 3, all the audio-visual playback devices 3 can receive the same file from the audio-visual processing device 1, that is, individually receive the same output video signal Vo and stereo sound. Signal S1, S2 (or So).

在一些實施例中，根據本發明之音訊空間化的影音處理方法可由電腦程式產品實現，以致於當影音處理裝置1載入程式並執行後可完成任一實施例之音訊空間化的影音處理方法。換言之，任一實施例之音訊空間化的影音處理方法藉由一個或多個處理單元執行韌體或軟體演算法而實現在影音處理裝置1上。在一些實施例中，電腦程式產品可為可讀取記錄媒體，而上述程式則儲存在可讀取記錄媒體中供影音處理裝置1載入。在一些實施例中，上述程式本身即可為電腦程式產品，並且經由有線或無線的方式傳輸至影音處理裝置1中。In some embodiments, the audio-spatialized audio-visual processing method according to the present invention can be implemented by a computer program product, so that the audio-spatialized audio-visual processing method of any embodiment can be completed after the audio-visual processing device 1 is loaded with the program and executed. . In other words, the audio-spatialized audio-visual processing method of any embodiment is implemented on the audio-visual processing device 1 by one or more processing units executing firmware or software algorithms. In some embodiments, the computer program product can be a readable recording medium, and the above programs are stored in the readable recording medium for loading by the audio-visual processing device 1 . In some embodiments, the above-mentioned program itself can be a computer program product, and is transmitted to the audio-visual processing device 1 via wired or wireless means.

綜上所述，根據本案之影音處理方法及其影音處理裝置之任一實施例，影音處理裝置1可根據影片的顯示位置對應地調整影片的聲音訊號的訊號參數。在一些實施例中，立體聲音訊號S1、S2（或So）與聲音訊號A1、A2相比有音量、延遲、相位、頻率或其他等任意組合之差異；或立體聲音訊號S1、S2（或So）是對應於影音播放裝置3的各聲道，於此，立體聲音訊號S1、S2（或So）於不同聲道輸出時，其在音量、延遲、相位、頻率或其他等任意組合的訊號參數上互為不同。藉此，可建立三度空間中的音訊空間化效果，以提供使用者可享受到立體影音效果。To sum up, according to any embodiment of the audio-visual processing method and the audio-visual processing device of the present application, the audio-visual processing device 1 can correspondingly adjust the signal parameters of the audio signal of the video according to the display position of the video. In some embodiments, the stereo sound signal S1, S2 (or So) has a difference in volume, delay, phase, frequency or other combinations compared with the sound signal A1, A2; or the stereo sound signal S1, S2 (or So ) is corresponding to each channel of the audio-visual playback device 3. Here, when the stereo audio signal S1, S2 (or So) is output in different channels, its signal parameters in any combination of volume, delay, phase, frequency or other are different from each other. In this way, the audio spatialization effect in the three-dimensional space can be established, so that users can enjoy stereoscopic video and audio effects.

1:影音處理裝置 10:輸入單元 11:影像處理單元 12:聲音處理單元 13:輸出埠 2,21,22:影音擷取裝置 3:影音播放裝置 31:左方版面位置 32:右方版面位置 33:上方版面位置 34:下方版面位置 35:前方版面位置 36:後方版面位置 41:目標 42:目標 V1,V2:初始影片 Vo:輸出影片訊號 A1,A2:聲音訊號 C1:控制訊號 L1:版面配置資訊 L2:預設參考點 L3,L4:長度資訊 W1,W2:寬度資訊 S1,S2:立體聲音訊號 So:整合聲音訊號 O1,O2:原點座標資訊 P1,P2:中心點座標資訊 D1,D2:深度距離 S01~S04:步驟 1: Audio-visual processing device 10: Input unit 11: Image processing unit 12: Sound processing unit 13: Output port 2,21,22: Audio and video capture device 3: Video player 31: Left layout position 32: Right layout position 33: Upper layout position 34: Bottom layout position 35: Front layout position 36: Rear layout position 41: target 42: target V1, V2: Initial movie Vo: output video signal A1, A2: Audio signal C1: Control signal L1: Layout Information L2: preset reference point L3, L4: length information W1, W2: width information S1, S2: Stereo audio signal So: Integrate audio signals O1, O2: origin coordinate information P1, P2: center point coordinate information D1, D2: depth distance S01~S04: Steps

[圖1]為根據本案之影音處理裝置連接影音擷取裝置與影音播放裝置之一實施例之方塊示意圖。 [圖2]為根據本案之影音處理裝置連接影音擷取裝置與影音播放裝置之另一實施例之方塊示意圖。 [圖3]為根據本案之影音處理方法之一實施例之流程圖。 [圖4]為顯示版面之一實施例之示意圖。 [圖5]為顯示版面之另一實施例之示意圖。 [圖6]為顯示版面之又一實施例之示意圖。 [圖7]為圖6之顯示版面所呈現的顯示畫面的示意圖。 [FIG. 1] is a schematic block diagram of an embodiment of an audio-visual processing device connected to an audio-visual capture device and an audio-visual playback device according to the present application. [ FIG. 2 ] is a schematic block diagram of another embodiment in which the audio-visual processing device is connected to the audio-visual capture device and the audio-visual playback device according to the present application. [Fig. 3] is a flow chart of an embodiment of the video and audio processing method according to the present case. [ FIG. 4 ] is a schematic diagram of an embodiment of a display panel. [Fig. 5] is a schematic diagram of another embodiment of the display panel. [ FIG. 6 ] is a schematic diagram of another embodiment of the display panel. [ FIG. 7 ] is a schematic diagram of a display screen presented on the display panel of FIG. 6 .

1:影音處理裝置 10:輸入單元 11:影像處理單元 12:聲音處理單元 13:輸出埠 2,21,22:影音擷取裝置 3:影音播放裝置 41,42:目標 V1,V2:初始影片 A1,A2:聲音訊號 S1,S2:立體聲音訊號 C1:控制訊號 L1:版面配置資訊 1: Audio-visual processing device 10: Input unit 11: Image processing unit 12: Sound processing unit 13: Output port 2,21,22: Audio and video capture device 3: Video player 41,42: target V1, V2: Initial movie A1, A2: Audio signal S1, S2: Stereo audio signal C1: Control signal L1: Layout information

Claims

An audio-spatial audio-visual processing method is applicable to an audio-visual processing device, the audio-visual processing device is connected to an audio-visual playback device, and the audio-visual processing method includes: receiving a plurality of initial videos and an audio signal corresponding to each of the initial videos by the audio-video processing device; generating at least one output video signal according to the plurality of initial videos by the video-audio processing device; adjusting each of the audio signals into a stereo audio signal according to a layout information by the audio-visual processing device; and Outputting the at least one output video signal and the plurality of stereo audio signals to the audio-visual playback device by the audio-visual processing device, so that the audio-visual playback device outputs the layout information with the plurality of initial videos in the layout by playing the at least one output video signal A display frame on the corresponding plurality of display blocks.

The audio-spatial audio-visual processing method as described in Claim 1, wherein the quantity of the at least one output video signal is one, and a display layout of the audio-visual playback device includes a plurality of layout positions respectively corresponding to the plurality of display blocks, and borrowing The step of generating the at least one output video signal according to the plurality of initial videos by the audio-visual processing device includes: arranging the plurality of initial videos on the plurality of layout positions by the audio-visual processing device according to the layout information to form the output output video signal.

The audio-spatial audio-visual processing method as described in Claim 1, wherein the at least one output video signal includes the plurality of initial videos, and a display layout of the audio-visual playback device includes a plurality of layout positions respectively corresponding to the plurality of display blocks, and The step of the audio-visual playback device outputting the display images on the plurality of display blocks corresponding to the layout information with the plurality of initial videos by playing the at least one output video signal includes: using the audio-visual playback device according to the layout The information shows that the plurality of initial videos are in the plurality of layout positions.

The audio-visual processing method for audio spatialization as described in Claim 1, wherein the layout information includes coordinate information corresponding to each of the initial videos, and the audio-visual processing device adjusts each of the audio signals according to the layout information to the The steps of stereo audio signal include: determining an audio adjustment value based on the coordinate information and a default reference point; and The audio signal is adjusted to the stereo audio signal according to the audio adjustment value.

The audio-spatial audio-visual processing method as described in Claim 4, wherein the step of determining the audio adjustment value according to the coordinate information and the preset reference point includes: judging a center point corresponding to a layout position of one of the plurality of display blocks according to each of the coordinate information; and The audio adjustment value is determined according to a relative distance between the center point and the preset reference point.

The audio-visual processing method of audio spatialization as described in claim 4, wherein the number of the plurality of display blocks is two, and a display layout displayed by the audio-visual playback device includes respectively corresponding to the plurality of display blocks and relative to the preset A first position and a second position of the reference point, and the steps of determining the audio adjustment value according to the coordinate information and the default reference point include: judging a layout position of the initial video as the first position or the second position based on the coordinate information and the default reference point; and The audio adjustment value is determined according to the judging result that the layout position is the first position or the second position.

The audio-spatial audio-video processing method as described in Claim 1, wherein the layout information includes a configuration position corresponding to each of the initial videos, and each configuration position is defined by a specified point coordinate information and side length information, the The designated point coordinate information defines the position of a designated point of a layout position of the initial video in a display layout of the audio-visual playback device, and the side length information defines the size of the layout position.

The audio-visual processing method for audio spatialization as described in Claim 7, wherein the coordinate information of each designated point includes a horizontal data, a vertical data, a depth data or any combination thereof, and the audio-visual processing device is configured according to the layout The step of information adjusting each of the audio signals to the stereo audio signal includes: determining a volume information, a delay information, a phase information, a frequency information, or any combination thereof based on the horizontal data, the vertical data, the depth data, or any combination thereof; and The stereo sound signal is generated by adjusting the volume, delay time, phase, frequency or other sound parameters of the sound signal according to the volume information, the delay information, the phase information, the frequency information or any combination thereof.

An audio-visual processing device, suitable for connecting with an audio-visual playback device, the audio-visual processing device includes: An image processing unit, receiving a plurality of initial images and generating at least one output image signal according to the plurality of initial images; An audio processing unit, connected to the image processing unit, receives a layout information and an audio signal corresponding to each of the original videos, and adjusts each of the audio signals into a stereo audio signal according to the layout information; and An output port, coupled to the image processing unit and the sound processing unit, outputs the stereo audio signal and the at least one output video signal to the audio-visual playback device, so that the audio-visual playback device outputs the at least one output video signal by playing the at least one output video signal. A display frame of the plurality of initial videos on the plurality of display blocks corresponding to the layout information.

The audio-visual processing device as described in claim 9, wherein the quantity of the at least one output video signal is one, a display layout displayed by the audio-visual playback device includes a plurality of layout positions respectively corresponding to the plurality of display blocks, and the image processing unit disposing the plurality of initial videos on the plurality of layout positions according to the layout information to form the output video signal for output.

The audio-visual processing device as described in Claim 9, wherein the layout information includes coordinate information corresponding to each initial video, and the sound processing unit determines an audio adjustment value according to each coordinate information and a preset reference point, and The stereo audio signal is generated by adjusting the corresponding audio signal according to each audio adjustment value.

The audio-visual processing device as described in claim 11, wherein the sound processing unit judges a center point corresponding to a layout position of one of the plurality of display blocks according to each of the coordinate information and according to each of the center points and the preset A relative distance between the reference points is used to determine the audio adjustment value.

The audio-visual processing device as described in claim 11, wherein the number of the plurality of display blocks is two, and the display layout displayed by the audio-visual playback device includes images respectively corresponding to the plurality of display blocks and relative to a preset reference position A first position and a second position, and the sound processing unit judges that a layout position of the initial video is the first position or the second position according to the coordinate information and the preset reference point, so as to determine the audio adjustment value.

The audio-visual processing device as described in claim 9, wherein the layout information includes a configuration position corresponding to each of the initial videos, and each configuration position is defined by a designated point coordinate information and side length information, and the designated point coordinate information The position of a designated point defining a layout position in a display layout of the audio-visual playback device, and the side length information defines the size of the layout position.

The audio-visual processing device as described in claim 14, wherein each of the designated point coordinate information includes a horizontal data, a vertical data, a depth data or any combination thereof, and the sound processing unit is based on the horizontal data, the vertical data, The depth data or any combination thereof determines a volume information, a delay information, a phase information, a frequency information or any combination thereof for adjusting the corresponding sound signal.