1308080 (1) 九、發明說明 【發明所屬之技術領域】 本發明是有關於一種促進與一電腦程式互動的設備與 方法。 【先前技術】 多年來,視訊遊戲工業已經歷許多的變化。當計算能 力提昇,視訊遊戲的開發者也同樣地創造了能利用增加的 計算能力之遊戲軟體。到最後,視訊遊戲的開發者已經將 遊戲編碼,而這些遊戲結合複雜的運算與數學來產生一非 常真實的遊戲體驗。 眾多的範例遊戲平台,可能是新力(Sony )的 Playstation 或是新力(Sony)的 Playstation〗(PS2),其 中的每一者是以一遊戲主控台的形式銷售出。眾所周知地 ,遊戲主控台被設計連接一監視器(通常爲一電視機)且 能允許使用者透過一手持式控制器與遊戲互動。遊戲主控 台被設計具有特定的處理硬體,此處理硬體包括一中央處 理器、用以處理密集圖形運算的一圖形合成器、用以進行 幾何轉換的一向量單元以及其他搭配硬體、韌體和軟體。 遊戲主控台更被設計具有一光碟托盤,此光碟托盤用以容 納遊戲光碟,以便透過遊戲主控台局域地進行遊戲。當然 ,線上遊戲也是可能的,一使用者可經由網際網路與其他 使用者互動地共同進行遊戲或作爲對手。 當遊戲複雜度不斷引起使用者的興趣,遊戲及硬體製 -4- (2) 1308080 造者持續創新來增加更多的互動功能。然而,實際上使用 者與一遊戲互動的方式在這多年來並未有戲劇性的變化。 有鑑於前述,能讓使用者與遊戲娛樂更先進地互動之 方法與系統是必須的。 【發明內容】 大致說來,本發明提供一種促進與一電腦程式互動的 設備與方法。在一實施例中,電腦程式是一遊戲程式,但 電腦程式並不限定爲遊戲程式,而上述設備與方法可應用 到任何接收聲音輸入以引發控制、輸入或促使通訊的電腦 環境中。更特別地,若聲音被使用於引發控制或輸入,本 發明的實施例能對特定音源之輸入進行濾波,且濾波過之 輸入係被設定忽略或排除無關的音源。在視訊遊戲環境中 ,依據所選定的音源,視訊遊戲可在處理過相關的音源之 後回應特定的反應,而且不出現失真或其他無關的聲音雜 訊。通常地,一遊戲娛樂環境將暴露在許多背景雜訊中, 例如音樂、其他人以及物體之移動。一但無關的聲音本質 上地被濾波出,電腦程式便可更良好地回應相關的聲音。 此回應可爲任意的形式,例如一指令、一動作的起始、一 選擇、一遊戲狀態或情況的改變、功能的開啓等。 在一實施例中,提出一種用以在與一電腦程式互動期 間擷取影像與聲音的設備。此設備包括一影像擷取單元, 而影像擷取單元係裝設用以擷取一個或更多個影像圖框。 此設備還包括一聲音擷取單元。聲音擷取單元係裝設用以 -5- (3) (3)1308080 確認一個或更多個音源。聲音擷取單元產生可經由分析以 測定一聚焦區域之資料,以便在聚焦區域上將聲音處理到 本質上排除聚焦區域外之聲音。如此,對於聚焦區域所擷 取並處理過的聲音將用於與電腦程式互動。 在另一實施例中,揭露一種在與一電腦程式互動期間 對選擇性音源監聽的方法。此方法包括接收來自兩個或更 多個聲音擷取麥克風上的一個或更多個音源之輸入。然後 ,此方法包括測定發自每一音源的延遲路徑以及確認每一 接收到的輸入之方向,這些接收到的輸入是來自一個或更 多個音源中的每一者。此方法還包括將一聚焦區域之確認 方向以外的音源濾波出。此聚焦區域係設定來提供與電腦 程式互動之音源。 在又一實施例中,提出一種遊戲系統。此遊戲系統包 括一影像-聲音擷取裝置,而影像-聲音擷取裝置係裝設用 以連接一計算系統,此計算系統使一互動式電腦遊戲可被 執行。影像-聲音擷取裝置包括視訊擷取硬體,而視訊擷 取硬體能被定位以擷取來自一聚焦區域的視訊。影像-聲 音擷取裝置也包括一麥克風陣列’此麥克風陣列用以擷取 來自一個或更多個音源的聲音。每一音源經確認過且與相 對於影像-聲音擷取裝置的一方向相關。與視訊擷取硬體 相關的聚焦區域係設定使用來確認在此方向上且位於聚焦 區域附近的音源其中之一。 通常,互動式聲音確認與追蹤可應用於連接一任意計 算設備的任何電腦程式。一旦音源被確認’則可進—步地 -6- (4) 1308080 處理曰源的內谷以引發、驅動、引導或控制一電腦程式提 供的功能或物件。 配合附圖並舉例說明本發明之原理,本發明的其他方 面與優點在經過下列的詳細敘述後將變的明顯易懂。 【實施方式】 本發明揭露方法與設備’這些方法與設備用以促進確 ® 認特定首源,以及當聲音用作爲與一電腦程式互動的工具 時可用以將不要的音源濾波出。 在下列的描述中’爲了提供本發明的一透徹瞭解,許 多的細節將被指出。然而,很明顯的,在不需知道部分或 全部細節的情況下,熟悉此項技藝者亦可實行本發明。在 其他例子中’爲了避免使本發明不易明瞭,一般所熟知的 程序步驟便不再詳細敘述。 圖1繪示依照本發明一實施例之遊戲環境1〇〇,其中 ® 一視訊遊戲程式可被執行以便與一個或多個使用者互動。 如圖例所示’玩家1 〇 2出現在一監視器i 〇 8前,此監視器 108包括一顯示畫面11〇。監視器1〇8與一計算系統1〇4 互相連接。計算系統可爲一標準的電腦系統、一遊戲主控 台或一可攜式電腦系統。在一特定的例子中,但並非限定 於任何廠牌’遊戲主控台可以是由新力(Sony )電腦娛樂 股份有限公司、微軟(Microsoft)或任何其他製造者生產 計算系統104顯示出與一影像-聲音擷取裝置106連 (5) (5)1308080 接。影像-聲音擷取裝置106包括一聲音擷取單元106a與 一影像擷取單元l〇6b。玩家102顯現出正與顯示畫面110 上的一遊戲圖案112互動地通訊。在所執行的視訊遊戲中 ,輸入至少是部分地由玩家102藉由影像擷取單元106b 與聲音擷取單元l〇6a提供的。如圖例所示,玩家102可 移動他的手以選擇顯示畫面110上的互動圖示114。一旦 影像擷取單元1 〇6b擷取到玩家1 02的影像,則一玩家的 半透明影像1 02 ’會被投影到顯示畫面1 1 〇上。因此,玩家 1 02知道要將他的手移動到何處來選擇圖示或與遊戲圖案 1 1 2互動。擷取這些移動與互動的技術可以變化,但示範 性的技術描述於英國專利申請案 GB03 04024.3 (PCT/GB2004/000693)與 GB 0304022.7(PCT/GB2004/000703 ),這些資料公開於 2003 年2月21日,且因此每一者皆引用於此作爲參考。 在所顯示的例子中,互動圖示114是一個可讓玩家選 擇的『擺動』之圖示,使得遊戲圖案112將會擺動正在處 理的物件。此外,玩家102可以提供語音指令,這些語音 指令由聲音擷取單元l〇6a擷取,並然後由計算系統104 處理以便與正執行的視訊遊戲互動。如圖所示,音源1 1 6a 是一個『跳躍』的語音指令。音源Π 6a然後將被聲音擷 取單元106a擷取,並然後由計算系統104處理以使遊戲 圖案1 1 2跳躍。語音辯識可用於確認語音指令。選擇性地 ,玩家102可與連接於網際網路或網路的遠端使用者通訊 ,但這些使用者也直接地或部分地參與了遊戲的互動。 -8- (6) 1308080 依照本發明一實施例,聲音擷取單元l〇6a係設定包 括至少兩個麥克風,這兩個麥克風將可使計算系統104選 擇來自特定方向的聲音。藉由使計算系統丨04濾波出非遊 戲娛樂(或焦點)之主要方向,當玩家丨02正提供特定指 令時’遊戲環境100中令人分心的聲音將不會干擾或混淆 了遊戲的執行。舉例而言’遊戲玩家〗02可敲擊他的腳並 發出一敲擊雜訊’而此敲擊雜訊是一非語言聲音117。當 來自於玩家1 02的腳之聲音並不位於視訊遊戲的聚焦區域 中,此聲音會被聲音擷取單元106a擷取,但然後會被濾 波出。 如將在下文中所述,聚焦區域是藉由有效影像區而更 佳地被確認’而有效影像區即爲影像擷取單元1 06b的焦 點。利用另一種方法,聚焦區域可以手動地從一區域的選 取而被選定,這些區域在一初始化步驟之後會呈現在使用 者面前。繼續圖1的例子,一遊戲觀察者103正在提供一 音源106b,此音源106b會擾亂在互動式遊戲娛樂期間計 算系統所進行之處理。然而,遊戲觀察者103並非位於影 像擷取單元1 06b的有效影像區內,因此來自遊戲觀察者 103的方向之聲音將會被濾波出,使計算系統104將不會 錯誤地混淆來自音源U6b與來自玩家1〇2(即音源116a )的指令,而音源1 1 6a的情況也相同。 影像-聲音擷取裝置106包括影像擷取單元l〇6b以及 聲音擷取單元106a。影像-聲音擷取裝置106能夠數位化 地擷取影像圖框並將那些影像圖框傳送至計算系統1 〇4作 -9- (7) 1308080 進一步的處理。影像擷取單元〗〇6b的一個例子是一網路 攝影機,這常用於需要擷取視訊影像且然後數位化地傳送 至一計算裝置作後續儲存時,或是用於透過一網路(例如 網際網路)的通訊。其他種類的影像擷取裝置也能運作, 不論是類比式或數位式,只要影像資料能被數位化地處理 而達到確認和濾波之功能。在一較佳實施例中,當輸入資 料被接收以後,使濾波可行的數位處理是利用軟體完成的 。聲音擷取單元l〇6a顯示出包括一對麥克風(標示爲 MIC1與MIC2)。這些麥克風是標準麥克風,且這些標準 麥克風能整合到組成影像-聲音擷取裝置106的外殼中。 圖3A繪示面對由聲音A與聲音B組成的音源116之 聲音擷取單元i 06a。如圖所示的,聲音A將投射其可聽 到的聲音且將會被沿著聲徑201a與201b的MIC1與MIC2 檢測到。聲音B將朝向聲徑202a與202b上的MIC1與 MIC2而被投射。如圖例所示,聲音A的聲徑將各具有不 的長度,因此當比較於聲徑202a與202b時則提供了一相 對延遲。來自聲音A與聲音B之每一者的聲音然後將會利 用一標準的三角運算而被處理,以使方向選擇可發生在方 塊216中,如顯示於圖3B。來自MIC1與MIC2的聲音將 在緩衝器1和2(210a、210b)中進行緩衝,並通過延遲 線(212a、212b)。在一實施例中,緩衝與延遲過程將由 軟體控制,雖然硬體也可被客製化來處理這些運算。依據 三角運算,方向選擇216將引發音源116其中之一的確認 與選定。 -10- (8) 1308080 在送出所選定音源之輸出以前,來自MIC1與MIC2 的每一者之聲音將在方塊2 1 4中加總。如此,除有效影像 區內的方向之外,來自其餘方向的聲音將會被濾波出,以 使這些音源不會干擾電腦系統1 04所進行的處理,或千擾 與其他使用者之間的通訊,而這些使用者經由一網路或網 際網路互動地進行一視訊遊戲。 圖4繪示依照本發明一實施例之一計算系統2 5 0,此 > 計算系統250可用於結合影像-聲音擷取裝置1〇6。計算系 統250包括一處理器252與一記憶體25 6。一匯流排254 將處理器與記憶體25 6連接至影像-聲音擷取裝置1〇6。記 憶體25 6將包括至少部分的互動程式25 8,並且也包括選 擇性音源監聽邏輯或監聽碼260,此選擇性音源監聽邏輯 或監聽碼2 60用以處理接所收到之音源資料。依據影像擷 取單元1 06b確認聚焦區域所在之位置,聚焦區域之外的 音源將由正在執行的選擇性音源監聽邏輯260 (例如,由 ί 處理器執行且至少部分儲存於記憶體2W中)選擇性地被 濾波出。計算系統以最簡單的形式顯現,但此處需強調任 何的硬體結構皆可被使用,只要此硬體能實行用以處理進 入音源以及選擇性監聽的指令。 計算系統250也顯示出是藉由匯流排而連結於顯示畫 面110。在此例子中,聚焦區域是由正朝向音源Β聚焦的 影像擷取單元確認。當聲音被聲音擷取單元1 06a擷取並 傳遞至計算系統250時,來自其他音源的聲音,例如聲源 A將隨後由選擇性音源監聽邏輯260濾波出。 -11 - (9) (9)1308080 在一特定的例子中,一玩家正參與另一使用者的—網 際網路或網路化視訊遊戲’而每一使用者的主要聽覺體驗 則是靠喇叭達成。喇叭可能是計算系統的一部份或可能是 監視器1 0 8的一部份。因此,假設局域喇叭是正產生音源 A的喇叭,如顯示於圖4中。爲了不導致出自音源a的局 域喇叭之聲音反饋給競賽使用者,選擇性音源監聽邏輯 260將會把首源A的聲音爐波出,以使競賽使用者將不會 接收到他或她自己的聲音或語音之反饋。藉由此濾波,在 與一視訊遊戲互動期間,一網路上的互動式通訊便爲可能 的,此過程期間則有效地避免了破壞性反饋。 圖5繪示一例子,其中影像·聲音擷取裝置106包括 至少四個麥克風(標示爲MIC1到MIC4 )。聲音擷取單元 106a,因此能夠較詳盡地處理三角運算,以確認音源116 (A與B )之位置。亦即,藉由提供一額外的麥克風,則 能更準確地界定音源的位置,並因此消除及濾波出無關、 破壞遊戲娛樂或破壞與一計算系統之互動的音源。如圖例 於圖5,音源1 16 ( B )是由視訊擷取單元106b所確認的 相關音源。繼續圖5的例子,圖6指出了音源B如何被確 認至一空間體積。 音源B所在的空間體積將界定聚焦體積2 74。藉由確 認一聚焦體積,則能消除或濾波出非位於一特定體積中( 即,非位於正確方向)的雜訊。爲了促進一聚焦體積274 之選定,影像-聲音擷取裝置106將更包括至少四個麥克 風。這四個麥克風其中至少一者位將於與其餘麥克風不同 -12- (10) Ϊ308080 的一平面上。藉由維持這四個持麥克風其中之一位 271,且其餘麥克風位於影像-聲音擷取裝置106 2 70中,則能界定一空間體積。 因此,當附近其他人(顯示爲276a與276b) 於界定在聚焦體積274內的空間體積中時,來自他 音將會被濾波出。此外,在空間體積外製造的雜訊 示爲喇叭276c )是落在空間體積外,因此也將被濾 圖7繪示依照本發明一實施例之一流程圖。此 始於運算302,其中從來自兩個或更多個聲音擷取 上的一個或更多個音源之輸入會被接收。在一例子 個或更多個聲音擷取麥克風是被整合到影像-聲音 置106中。選擇性地,兩個或更多個聲音擷取麥克 是連接影像擷取單元1 〇6b的一第二模組/外殼之一 選擇性地,聲音擷取單元1 06a可包括任意個聲音 克風,且這些聲音擷取麥克風可被放置於特定位置 些位置係設計用來擷取連接一計算系統的一使用者 〇 此方法進行至運算304,其中音源中的每一者 路徑會被測定。樣本延遲路徑是由圖3 A的聲徑 2〇2界定。如眾所周知的,延遲路徑界定了聲波從 到特定麥克風的時間,而這些麥克風座落在適當位 取聲音。依據聲音從特定音源116傳播出去所耗費 ’麥克風可藉由使用一標準的三角運算來測定延遲 及聲音發出的大約位置。 於平面 的平面 並不位 們的聲 (如顯 波出。 方法起 麥克風 中,兩 擷取裝 風可以 部份。 擷取麥 ,而這 之聲音 之延遲 201與 音源傳 置以擷 之延遲 爲何以 -13- (11) 1308080 此方法然後繼續進行至運算306,其中每一所接收到 的輸入之方向會被確認,這些所接收到之輸入來自一個或 更多個音源。亦即,發自音源116的聲音之傳入方向會被 確認相對於影像-聲音擷取裝置的位置,包括聲音擷取單 元1 〇6a。依據確認的方向,非位於一聚焦區域(或體積) 之一確認方向中的音源會在運算308中被濾波出。藉由將 非源自聚焦區域附近的方向之音源濾波出,則能使用未被 φ 濾波的音源與一電腦程式互動,如顯示於運算3 1 0中。 舉例而言,互動程式可爲一視訊遊戲,在此視訊遊戲 中,使用者可互動地與視訊遊戲的功能通訊,或是與在視 訊遊戲中對抗主要玩家的其他玩家通訊。對手玩家可以在 當地或位於遠端位置,並透過一網路(例如網際網路)與 主要使用者通訊。此外,視訊遊戲也可在多位組成群組的 使用者之間進行,這個群組是設計用以在一特定比賽中互 動地挑戰彼此之技巧,其中特定比賽與視訊遊戲相關。 • 圖8繪示一流程圖,其中影像·聲音擷取裝置運算320 在圖中顯示出與軟體執行運算分離,這些運算是實施於運 算3 40中所接收到之輸入上。因此,一旦來自兩個或更多 個聲音擷取麥克風上的一個或更多個音源之輸入在運算 3 02中被接收,此方法便會進行至運算3 04,而在運算3 04 中,每一音源的延遲路徑是藉由軟體而被測定。依據延遲 路徑,對於在運算306中的一個或更多個音源中的每一者 ,每一所接收到的輸入之方向會被確認,如同上述所說的 -14- (12) (12)1308080 在此處’此方法進行至運算312,其中在視訊擷取附 近的確認方向會被測定。舉例而言,視訊擷取將鎖定一有 效影像區(或體積),且在此影像-有效區內或附近的音 源之相關方向將被測定。依據此測定,此方法進行至運算 314,其中不在視訊擷取附近的方向(或體積)會被濾波 出。所以’影響主要玩家之視訊遊戲娛樂的干擾、雜訊以 及其他無關的輸入將在此步驟中被濾波出,而此步驟是由 遊戲進行期間執行的軟體實施。 因此’主要使用者能與視訊遊戲互動、與其他使用視 訊遊戲的使用者互動或透過網路與其他使用者通訊,而此 網路可被登入或與視訊遊戲之進行相關。如此的視訊遊戲 通訊、互動以及控制將因此不會被無關的雜訊及/或觀察 者打斷,這些觀察者並不打算互動地通訊或參與一特定遊 戲或是互動程式。 値得注意的是,此處所描述的實施例也可應用於線上 遊戲應用。亦即,上述實施例可發生在一伺服器中,此伺 服器傳送一視訊訊號至一分散式網路(例如網際網路)上 的多個使用者,以使在遠端吵雜位置的玩家能與彼此通訊 。更値得注意的是,此處所描述的實施例可透過一硬體或 一軟體實作來進行。亦即’上述的功能性描述可被綜合來 定義一微晶片,此微晶片具有特定邏輯,可針對每一模組 實行功能性任務,而這些模組與雜訊消除方案有關。 再者,音源的選擇性濾波可具有其他的應用,例如電 話。在電話使用環境中,通常有一個主要人物(即撥話方 -15- (13) (13)Ϊ308080 )想要與一第三方對話(即受話方)。在通訊期間,然而 ’在附近可能有其他人正在談話或是製造噪音。被鎖定朝 向主要使用者的電話(例如’接收器的方向)可使來自主 要使用者的嘴唇之聲音成爲聚焦區域,且因此使僅針對主 要使用者的選擇性監聽成爲可行。此選擇性監聽能將無關 於主要使用者的語音或雜訊本質上濾波出,而收話方因此 將能接收到來自使用電話的主要人物之較清晰通訊。 附加技術也可以包括其他電子裝備,這些電子裝備能 從聲音通話受益,而此聲音係作爲控制或通訊的一輸入。 舉例而言,一使用者可以利用語音指令控制一汽車的設定 ,並避免其他乘客干擾指令。其他的應用可包括電腦應用 控制,例如瀏覽應用、文件準備或通訊。藉由此濾波,則 可較有效地發出語音或聲音指令而不會被周圍的聲音打斷 。類似情況下,附加技術也可包括任何電子設備。 更進一步地,本發明的實施例具有廣泛的應用,且申 請專利範圍應解讀爲包括任何從這些實施例獲益的應用。 舉例而言,在一類似應用中,可利用聲音分析將音源 濾波出。若利用聲音分析,則可使用盡可能最少數目之麥 克風,例如一個麥克風。由單一麥克風擷取的聲音能被數 位化地分析(以軟體或硬體)以測定哪一語音或聲音是相 關的。在一些環境中,例如遊戲,主要使用者可記錄他或 她的語音,以訓練系統辯識特定語音。如此’其他語音或 聲音將可以被排除。因此,當濾波可依據聲音音調或頻率 而被完成時’便不需確認一方向。 -16- (14) 1308080 當方向與體積被納入考量時,上述提到關於聲音濾波 的全部優點亦相同地適用。 將上述實施例謹記在心中,則應該瞭解到本發明可以 使用不同的電腦-實作運算,這些電腦-實作運算牽涉到儲 存於'電腦系統中的資料。這些運算包括實際計算物理量的 運算。通常’雖然並非必須,這些物理量採取能被儲存、 傳送 '結合、比較以及其他運算的電訊號或磁訊號之形式 • 。更進一步地’被實行的運算時常是指一些項目,例如製 造、確認、測定或比較。 上述發明可被實施於其他電腦系統結構,包括手持式 裝置、微處理器系統、內含微處理器或可程式消耗性電子 產品、迷你電腦、主機電腦以及類似結構。本發明也可被 實施於分散式電腦環境中,而在這些分散式電腦環境中, 任務是由遠端處理裝置進行,這些遠端處理裝置透過一通 訊網路連結在一起。 ® 本發明也可被實施作爲一電腦可讀媒體上的電腦可讀 碼。此電腦可讀媒體是任何能儲存資料的資料儲存裝置, 而資料此後能由一電腦系統(包括一電磁波載子)讀取。 電腦可讀媒體的例子包括硬碟、網路附接儲存器(NAS ) 、唯讀記憶體、隨機儲存記憶體、唯讀光碟機、光碟燒錄 機、可抹寫光碟燒錄機、磁帶以及其他光學與非光學資料 儲存裝置。電腦可讀媒體也可被分佈在一網路耦合電腦系 統,以使電腦可讀碼以一分散方式被儲存與執行。 雖然前述發明已基於清楚理解之目的而被詳述,然而 -17- (15) 1308080 在後附申請專利範圍內,些許的更動與修改是可被實施的 。因此,本發明的實施應被認爲是示範性的,且爲非限制 性的,而本發明並不限定於此處所給與的細節,但在後附 申請專利範圍以及等價事物內,當可作些許修改。 【圖式簡單說明】 在參考下列結合所附圖式的詳細敘述後,本發明以及 本發明進一步之優點可較容易地被瞭解。 圖1繪示依照本發明一實施例之遊戲環境,其中一視 訊遊戲程式可被執行以便與一個或多個使用者互動。 圖2繪示依照本發明一實施例之樣本影像-聲音擷取 裝置的三維圖形。 圖3A繪示依照本發明一實施例之不同麥克風上的聲 徑之處理’這些麥克風係設計來接收輸入;圖3]B繪示依 照本發明一實施例之輸出所選定的音源之邏輯。 圖4繪示依照本發明一實施例之樣本計算系統,此樣 本5十算系統連接用以處理輸入音源的一影像-聲音擷取裝 置。 圖5繪示依照本發明一實施例之一例子,其中多個麥 克風用於增加確認特定音源之方向的精密度。 圖6繪示依照本發明一實施例之一例子,其中聲音被 確認位在一特定空間體積,而此空間體積使用了不同平面 中的麥克風。 實施例之示範方法 圖7與圖8分別繪示依照本發明一 -18- (16) 1308080 運算,這些運算可在確認音源以及排除非聚焦音源的期間 進行。 【主要元件符號說明】 100 :遊戲環境 1 02 :玩家 102’ :半透明影像 φ 103 :遊戲觀察者 104、25 0 :計算系統 106 :影像-聲音擷取裝置 l〇6a :聲音擷取單元 106b :影像擷取單元 108 :監視器 1 1 0 :顯示畫面 1 12 :遊戲圖案 # 1 14 :互動圖示 116、 116a、 116b:音源 1 1 7 :非語言聲音 201、 201a、 201b、 202、 202a、 202b :聲徑 2 1 0 a :緩衝器1 2 1 0 b :緩衝器2 2 1 2 a、2 1 2 b :延遲線 214 :方塊 216:方塊(方向選擇) -19· (17) (17)1308080 2 5 2 :處理器 2 5 4 :匯流排 2 5 6 :記憶體 2 5 8 :互動程式 2 60 :選擇性音源監聽邏輯(碼) 270 、 271:平面 274 :聚焦體積 276a ' 276b :其他人 276c :喇叭 302 、 304 > 306 、 308 、 310 、 312 、 314 、 320 、 340 : 運算 MIC1、MIC2、MIC3、MIC4 :麥克風1308080 (1) Description of the Invention [Technical Field of the Invention] The present invention relates to an apparatus and method for facilitating interaction with a computer program. [Prior Art] The video game industry has undergone many changes over the years. As computing power increases, video game developers have similarly created game software that can take advantage of increased computing power. In the end, video game developers have coded the game, and these games combine complex operations and math to produce a very real gaming experience. Numerous sample game platforms may be Sony's Playstation or Sony's Playstation (PS2), each of which is sold as a game console. As is well known, game consoles are designed to connect to a monitor (usually a television) and allow the user to interact with the game through a handheld controller. The game console is designed with specific processing hardware. The processing hardware includes a central processing unit, a graphics synthesizer for processing dense graphics operations, a vector unit for geometric transformation, and other matching hardware. Firmware and software. The game console is also designed to have a disc tray for containing game discs for local game play through the game console. Of course, online games are also possible, and a user can interact with other users via the Internet to play games or as opponents. When the complexity of the game continues to attract users' interest, the game and the hard system -4- (2) 1308080 continue to innovate to add more interactive features. However, the way in which users actually interact with a game has not changed dramatically over the years. In view of the foregoing, methods and systems that allow users to interact more with game entertainment are necessary. SUMMARY OF THE INVENTION Broadly speaking, the present invention provides an apparatus and method for facilitating interaction with a computer program. In one embodiment, the computer program is a game program, but the computer program is not limited to a game program, and the above devices and methods can be applied to any computer environment that receives sound input to cause control, input or communication. More specifically, if sound is used to initiate control or input, embodiments of the present invention can filter the input of a particular source and the filtered input is set to ignore or exclude unrelated sources. In a video game environment, depending on the selected source, the video game can respond to a specific response after processing the relevant source without distortion or other unrelated noise. Often, a gaming entertainment environment will be exposed to many background noises, such as music, other people, and the movement of objects. Once the unrelated sound is essentially filtered out, the computer program can respond more closely to the relevant sound. This response can be in any form, such as an instruction, the start of an action, a selection, a change in a game state or situation, the opening of a function, and the like. In one embodiment, an apparatus for capturing images and sounds during interaction with a computer program is presented. The device includes an image capture unit, and the image capture unit is configured to capture one or more image frames. The device also includes a sound capture unit. The sound capture unit is equipped with -5- (3) (3) 1308080 to confirm one or more sources. The sound capture unit generates data that can be analyzed to determine a focus area to process the sound over the focus area to essentially exclude sound outside of the focus area. As such, the sounds captured and processed for the focus area will be used to interact with the computer program. In another embodiment, a method of listening to a selective source during interaction with a computer program is disclosed. The method includes receiving input from one or more sound sources on two or more sound extraction microphones. The method then includes determining a delay path from each of the sources and confirming the direction of each of the received inputs from each of the one or more sources. The method also includes filtering out sources other than the direction of confirmation of a focus area. This focus area is set to provide a source of interaction with the computer program. In yet another embodiment, a gaming system is presented. The gaming system includes an image-sound capture device that is coupled to connect to a computing system that enables an interactive computer game to be executed. The image-sound capture device includes video capture hardware, and the video capture hardware can be positioned to capture video from a focus area. The image-sound capture device also includes a microphone array 'this microphone array is used to capture sound from one or more sources. Each source is identified and associated with a direction relative to the image-sound capture device. The focus area associated with the video capture hardware is used to confirm one of the sources in this direction and near the focus area. Typically, interactive voice confirmation and tracking can be applied to any computer program that connects to an arbitrary computing device. Once the source is confirmed, then step -6- (4) 1308080 handles the inner valley of the source to trigger, drive, direct, or control the functions or objects provided by a computer program. Other aspects and advantages of the present invention will become apparent from the Detailed Description of the invention. [Embodiment] The present invention discloses methods and apparatus. These methods and apparatus are used to facilitate the identification of a particular source, and can be used to filter out unwanted sources when the sound is used as a tool to interact with a computer program. In the following description, numerous details are set forth to provide a thorough understanding of the invention. However, it will be apparent that those skilled in the art can practice the invention without knowing some or all of the details. In other instances, well-known procedural steps are not described in detail in order to avoid obscuring the invention. 1 illustrates a gaming environment in which a video game program can be executed to interact with one or more users in accordance with an embodiment of the present invention. As shown in the figure, 'player 1 〇 2 appears before a monitor i 〇 8, and this monitor 108 includes a display screen 11〇. The monitor 1〇8 is interconnected with a computing system 1〇4. The computing system can be a standard computer system, a gaming console or a portable computer system. In a particular example, but not limited to any label, the game console can be displayed with an image by Sony Computer Entertainment Inc., Microsoft or any other manufacturer's production computing system 104. - Sound extraction device 106 is connected (5) (5) 1308080. The image-sound capturing device 106 includes a sound capturing unit 106a and an image capturing unit 106b. Player 102 appears to be in interactive communication with a game pattern 112 on display screen 110. In the video game being executed, the input is provided at least in part by the player 102 by the image capturing unit 106b and the sound capturing unit 106a. As shown in the illustration, player 102 can move his hand to select interactive graphic 114 on display screen 110. Once the image capturing unit 1 〇 6b captures the image of the player 102, a player's semi-transparent image 102' will be projected onto the display screen 1 1 。. Therefore, the player 102 knows where to move his hand to select the icon or interact with the game pattern 1 1 2 . The techniques for capturing these movements and interactions can vary, but exemplary techniques are described in British Patent Application Nos. GB03 04024.3 (PCT/GB2004/000693) and GB 0304022.7 (PCT/GB2004/000703), which were published in February 2003. 21st, and therefore each is hereby incorporated by reference. In the example shown, the interactive graphic 114 is a graphical representation of the "swing" that the player can select such that the game pattern 112 will swing the object being processed. In addition, player 102 can provide voice commands that are retrieved by voice capture unit 16a and then processed by computing system 104 to interact with the video game being executed. As shown, the source 1 1 6a is a "jump" voice command. The source Π 6a will then be retrieved by the sound capture unit 106a and then processed by the computing system 104 to cause the game pattern 1 1 2 to jump. Voice recognition can be used to confirm voice commands. Alternatively, player 102 can communicate with remote users connected to the Internet or the network, but these users also participate directly or partially in the interaction of the game. -8- (6) 1308080 In accordance with an embodiment of the invention, the sound capture unit 106a is configured to include at least two microphones that will enable the computing system 104 to select sounds from a particular direction. By filtering the computing system 丨04 out of the main direction of non-game entertainment (or focus), when the player 丨02 is providing a specific command, the distracting sound in the game environment 100 will not interfere or confuse the execution of the game. . For example, the 'game player' 02 can tap his foot and emit a knocking noise ' and the knocking noise is a non-verbal sound 117. When the sound from the foot of the player 102 is not located in the focus area of the video game, the sound is captured by the sound capture unit 106a, but then filtered out. As will be described later, the focus area is better confirmed by the effective image area' and the effective image area is the focus of the image capturing unit 106b. Alternatively, the focus area can be manually selected from the selection of an area that will be presented to the user after an initialization step. Continuing with the example of Figure 1, a game viewer 103 is providing an audio source 106b that can disrupt the processing performed by the computing system during interactive gaming entertainment. However, the game viewer 103 is not located within the active image area of the image capture unit 106b, so the sound from the direction of the game viewer 103 will be filtered out so that the computing system 104 will not erroneously confuse the source U6b with The command from the player 1〇2 (i.e., the sound source 116a) is the same as in the case of the sound source 1 16a. The image-sound capturing device 106 includes an image capturing unit 106b and a sound capturing unit 106a. The image-sound capture device 106 is capable of digitally capturing image frames and transmitting those image frames to the computing system 1 〇 4 for further processing -9-(7) 1308080. An example of an image capture unit 〇 6b is a web camera, which is often used to capture video images and then digitally transmit them to a computing device for subsequent storage, or for use over a network (eg, the Internet) Network) communication. Other types of image capture devices can operate, whether analog or digital, as long as the image data can be processed digitally to achieve the function of confirmation and filtering. In a preferred embodiment, after the input data is received, the digital processing that makes the filtering feasible is done using software. The sound capture unit 16a is shown to include a pair of microphones (labeled MIC1 and MIC2). These microphones are standard microphones and these standard microphones can be integrated into the housing that makes up the image-sound capture device 106. Fig. 3A illustrates a sound capturing unit i 06a facing a sound source 116 composed of sound A and sound B. As shown, sound A will project its audible sound and will be detected along MIC1 and MIC2 of acoustic paths 201a and 201b. Sound B will be projected towards MIC1 and MIC2 on acoustic paths 202a and 202b. As shown in the legend, the acoustic paths of sound A will each have a different length, thus providing a relative delay when compared to acoustic paths 202a and 202b. The sound from each of sound A and sound B will then be processed using a standard trigonometric operation such that direction selection can occur in block 216, as shown in Figure 3B. The sound from MIC1 and MIC2 will be buffered in buffers 1 and 2 (210a, 210b) and passed through the delay lines (212a, 212b). In one embodiment, the buffering and delaying process will be controlled by the software, although the hardware can also be customized to handle these operations. Based on the trigonometric operation, direction selection 216 will confirm and select one of the sources of speech 116. -10- (8) 1308080 The sound from each of MIC1 and MIC2 will be summed in block 2 1 4 before the output of the selected source is sent. In this way, in addition to the direction in the effective image area, the sound from the remaining directions will be filtered out so that the sound sources do not interfere with the processing performed by the computer system 104, or the communication between the interference and other users. And these users interactively perform a video game via a network or the Internet. FIG. 4 illustrates a computing system 250 in accordance with an embodiment of the present invention. The computing system 250 can be used in conjunction with the image-sound capturing device 1〇6. Computing system 250 includes a processor 252 and a memory 256. A bus 254 connects the processor and memory 25 6 to the image-sound capture device 1〇6. The memory block 25 6 will include at least a portion of the interactive program 25 8 and also includes a selective sound source monitoring logic or monitor code 260 for processing the received source material. According to the image capturing unit 106b, the position of the focus area is confirmed, and the sound source outside the focus area will be selectively executed by the selective sound source monitoring logic 260 (for example, executed by the ί processor and at least partially stored in the memory 2W). The ground is filtered out. The computing system appears in its simplest form, but it should be emphasized here that any hardware structure can be used as long as the hardware can implement instructions for processing incoming sources and selective listening. Computing system 250 is also shown coupled to display screen 110 by bus bars. In this example, the focus area is confirmed by the image capture unit that is focusing toward the source Β. When sound is captured by sound capture unit 106a and passed to computing system 250, sound from other sources, such as sound source A, will then be filtered out by selective sound source monitoring logic 260. -11 - (9) (9)1308080 In a particular example, one player is participating in another user's Internet or networked video game' and each user's primary listening experience is based on a speaker. Achieved. The horn may be part of the computing system or may be part of the monitor 108. Therefore, it is assumed that the local speaker is the speaker that is generating the sound source A, as shown in FIG. In order not to cause the sound of the local speaker from the sound source a to be fed back to the contestant, the selective sound source monitoring logic 260 will wave the sound source of the first source A so that the contest user will not receive him or herself. Feedback from voice or voice. By this filtering, interactive communication over a network is possible during interaction with a video game, effectively avoiding destructive feedback during this process. Figure 5 illustrates an example in which the image/sound capture device 106 includes at least four microphones (labeled MIC1 through MIC4). The sound capture unit 106a is therefore capable of processing the trigonometric operations in more detail to confirm the position of the sound source 116 (A and B). That is, by providing an additional microphone, the position of the sound source can be more accurately defined, and thus the sound source that is unrelated, disrupts game play, or disrupts interaction with a computing system can be eliminated and filtered out. As shown in Fig. 5, the sound source 1 16 (B) is a correlated sound source confirmed by the video capturing unit 106b. Continuing with the example of Figure 5, Figure 6 indicates how the source B is recognized to a spatial volume. The volume of space in which source B is located will define a focus volume of 2 74. By identifying a focus volume, noise that is not located in a particular volume (i.e., not in the correct direction) can be eliminated or filtered out. To facilitate selection of a focus volume 274, the image-sound capture device 106 will further include at least four microphones. At least one of the four microphones will be on a plane different from the remaining microphones -12-(10) Ϊ308080. By maintaining one of the four microphones 271, and the remaining microphones are located in the image-sound capture device 106 2 70, a spatial volume can be defined. Thus, when other nearby people (shown as 276a and 276b) are defined in the volume of space within the focus volume 274, the tone will be filtered out. In addition, the noise generated outside the space volume is shown as the horn 276c) which falls outside the volume of the space, and therefore will also be filtered to illustrate a flow chart in accordance with an embodiment of the present invention. This begins with operation 302 in which input from one or more sources from two or more sound captures is received. In an example or more of the sound capture microphones are integrated into the image-audio set 106. Optionally, two or more sound capture microphones are one of a second module/housing connected to the image capture unit 1 〇 6b. Optionally, the sound capture unit 106a may include any sound. And these sound capture microphones can be placed at specific locations to designate a user to access a computing system. The method proceeds to operation 304 where each of the sound sources is determined. The sample delay path is defined by the acoustic path 2〇2 of Figure 3A. As is well known, the delay path defines the time from which the sound waves travel to a particular microphone, and these microphones are positioned to pick up the sound at the appropriate location. The cost of propagating from a particular source 116 is based on the fact that the microphone can be used to determine the delay and the approximate location of the sound by using a standard trigonometric operation. The plane of the plane is not the sound of the plane (such as the oscillating wave. The method starts from the microphone, and the two squeaks can be part of the wind. Take the wheat, and the delay of the sound 201 and the delay of the sound source are delayed. In the manner of -13-(11) 1308080, the method then proceeds to operation 306 where the direction of each received input is acknowledged and the received input is from one or more sources. The incoming direction of the sound of the sound source 116 is confirmed relative to the position of the image-sound capturing device, including the sound capturing unit 1 〇 6a. Depending on the direction of confirmation, it is not in a confirmation direction of one of the focus areas (or volumes) The source is filtered out in operation 308. By filtering the source that is not derived from the direction near the focus area, the source that is not filtered by φ can be used to interact with a computer program, as shown in operation 3 1 0. For example, the interactive program can be a video game in which the user can interactively communicate with the function of the video game or with other players in the video game against the main player. Home communication. The opponent player can communicate with the primary user locally or at a remote location and through a network (such as the Internet). In addition, the video game can also be played between multiple users. This group is designed to interactively challenge each other in a particular game, where a particular game is associated with a video game. • Figure 8 depicts a flow diagram in which the image/sound capture device operation 320 is shown The operations are separated from the software, which are implemented on the input received in operation 340. Thus, once the input from one or more of the two or more sound extraction microphones is in operation 3 02 When it is received, the method proceeds to operation 3 04, and in operation 3 04, the delay path of each source is determined by software. Depending on the delay path, for one or more of operations 306 For each of the sources, the direction of each received input is confirmed, as described above -14-(12) (12)1308080 where 'this method proceeds to operation 312, where The direction of confirmation near the video capture will be determined. For example, the video capture will lock an active image area (or volume) and the relevant direction of the sound source in or near the image-valid zone will be determined. As determined, the method proceeds to operation 314 where the direction (or volume) that is not in the vicinity of the video capture is filtered out. Therefore, interference, noise, and other unrelated inputs that affect the video game entertainment of the primary player will be in this step. Filtered out, and this step is implemented by the software during the game. Therefore, 'primary users can interact with video games, interact with other users who use video games, or communicate with other users via the Internet. Roads can be logged in or associated with video games. Such video game communications, interactions, and controls will therefore not be interrupted by unrelated noise and/or observers who do not intend to interactively communicate or participate in a particular game or interactive program. It is noted that the embodiments described herein are also applicable to online gaming applications. That is, the above embodiment can occur in a server that transmits a video signal to a plurality of users on a decentralized network (eg, the Internet) to make the player in a noisy location at the far end. Can communicate with each other. It is further noted that the embodiments described herein can be implemented by a hardware or a software implementation. That is, the functional description described above can be combined to define a microchip having specific logic that can perform functional tasks for each module associated with the noise cancellation scheme. Furthermore, selective filtering of the sound source can have other applications, such as telephony. In the telephone usage environment, there is usually a main character (ie, dialing party -15-(13) (13) Ϊ 308080) who wants to talk to a third party (ie, the called party). During the communication, however, there may be others in the vicinity who are talking or making noise. The phone that is locked toward the primary user (e.g., the direction of the receiver) can make the sound of the lips from the primary user a focus area, and thus make selective monitoring for only the primary user feasible. This selective listening can essentially filter out speech or noise that is irrelevant to the primary user, and the recipient will therefore receive clearer communications from the primary person using the phone. Additional technology may also include other electronic equipment that can benefit from a voice call that is an input to control or communication. For example, a user can use voice commands to control the settings of a car and avoid other passengers interfering with the commands. Other applications can include computer application controls such as browsing applications, file preparation or communication. By this filtering, voice or voice commands can be issued more efficiently without being interrupted by surrounding sounds. In an similar situation, the additional technology may also include any electronic device. Still further, embodiments of the invention have broad application, and the scope of the claims should be construed to include any application that benefits from the embodiments. For example, in a similar application, sound sources can be filtered out using sound analysis. If sound analysis is used, the least possible number of microphones, such as a microphone, can be used. The sound drawn from a single microphone can be digitally analyzed (in software or hardware) to determine which voice or sound is relevant. In some environments, such as games, the primary user can record his or her voice to train the system to recognize a particular voice. So other voices or sounds can be excluded. Therefore, when the filtering can be completed depending on the sound pitch or frequency, there is no need to confirm a direction. -16- (14) 1308080 When the direction and volume are taken into account, all the advantages mentioned above regarding sound filtering apply equally. Keeping the above embodiments in mind, it should be understood that the present invention can use different computer-implementation operations that involve data stored in a 'computer system'. These operations include operations that actually calculate physical quantities. Usually, although not required, these physical quantities take the form of electrical or magnetic signals that can be stored, transmitted, combined, compared, and otherwise manipulated. Further operations are often referred to as items such as manufacturing, validation, measurement or comparison. The above invention can be implemented in other computer system architectures, including hand-held devices, microprocessor systems, microprocessors or programmable consumable electronics, mini computers, host computers, and the like. The present invention can also be implemented in a decentralized computer environment where tasks are performed by remote processing devices that are coupled together through a communication network. ® The invention can also be implemented as a computer readable code on a computer readable medium. The computer readable medium is any data storage device capable of storing data, and the data can thereafter be read by a computer system (including an electromagnetic wave carrier). Examples of computer readable media include hard disk, network attached storage (NAS), read only memory, random storage memory, CD-ROM, CD burner, rewritable CD burner, tape, and Other optical and non-optical data storage devices. The computer readable medium can also be distributed over a network coupled computer system to cause the computer readable code to be stored and executed in a distributed fashion. Although the foregoing invention has been described in detail for the purpose of clarity of understanding, -17- (15) 1308080, within the scope of the appended claims, may be practiced. Therefore, the present invention should be considered as illustrative and not restrictive, and the invention is not limited to the details given herein, but in the scope of the appended claims and equivalents Some modifications can be made. BRIEF DESCRIPTION OF THE DRAWINGS The present invention and further advantages of the present invention will be readily apparent from the following detailed description of the appended claims. 1 illustrates a gaming environment in which a video game program can be executed to interact with one or more users in accordance with an embodiment of the present invention. 2 is a three-dimensional diagram of a sample image-sound capture device in accordance with an embodiment of the present invention. 3A illustrates the processing of acoustic paths on different microphones in accordance with an embodiment of the present invention. These microphones are designed to receive input; and FIG. 3B illustrates logic for outputting selected sound sources in accordance with an embodiment of the present invention. 4 illustrates a sample computing system coupled to an image-sound capture device for processing an input source, in accordance with an embodiment of the present invention. Figure 5 illustrates an example in which multiple microphones are used to increase the precision of confirming the direction of a particular source, in accordance with an embodiment of the present invention. Figure 6 illustrates an example in which sound is asserted at a particular volume of space using a microphone in a different plane, in accordance with an embodiment of the present invention. Exemplary Method of Embodiments Fig. 7 and Fig. 8 respectively illustrate an -18-(16) 1308080 operation in accordance with the present invention, which can be performed during the confirmation of the source and the exclusion of the unfocused source. [Main component symbol description] 100: Game environment 1 02: Player 102': Translucent image φ 103: Game observer 104, 25 0: Computing system 106: Image-sound capturing device 16A: Sound capturing unit 106b : image capturing unit 108 : monitor 1 1 0 : display screen 1 12 : game pattern # 1 14 : interactive icon 116, 116a, 116b: sound source 1 1 7 : non-verbal sound 201, 201a, 201b, 202, 202a 202b: acoustic path 2 1 0 a : buffer 1 2 1 0 b : buffer 2 2 1 2 a, 2 1 2 b : delay line 214: block 216: block (direction selection) -19· (17) ( 17) 1308080 2 5 2 : Processor 2 5 4 : Bus 2 2 6 : Memory 2 5 8 : Interactive program 2 60 : Selective source monitoring logic (code) 270 , 271 : Plane 274 : Focus volume 276a ' 276b : Others 276c: Speakers 302, 304 > 306, 308, 310, 312, 314, 320, 340: Operation MIC1, MIC2, MIC3, MIC4: Microphone