TW202315414A - Infrared remote controlled video and audio device and infrared remote control method of playing video and audio - Google Patents
Infrared remote controlled video and audio device and infrared remote control method of playing video and audio Download PDFInfo
- Publication number
- TW202315414A TW202315414A TW110134476A TW110134476A TW202315414A TW 202315414 A TW202315414 A TW 202315414A TW 110134476 A TW110134476 A TW 110134476A TW 110134476 A TW110134476 A TW 110134476A TW 202315414 A TW202315414 A TW 202315414A
- Authority
- TW
- Taiwan
- Prior art keywords
- audio
- target
- indication signal
- processing
- infrared remote
- Prior art date
Links
Images
Abstract
Description
本發明係關於一種影音播放控制方法,特別係關於一種紅外線遙控影音播放方法。The present invention relates to a video and audio playback control method, in particular to an infrared remote control audio and video playback method.
傳統電視必須接上數位電視天線、HDMI影音輸入或是第四台來觀看節目,按照既定的播放時間來進行觀賞。近年來智慧家電興起,所有家電都訴求可連網、可連結應用程式(APP)等功能,電視亦包含在內。智慧電視提供了如同智慧型手機可以上網透過YouTube、NETFLIX、Apple TV+、myVideo等平台觀看影片的功能。然而目前電視在播放影片時,通常聲音都是混著播出,因此使用者常遇到無法分辨聲音是屬於顯示畫面中的何角色的問題。Traditional TV must be connected to digital TV antenna, HDMI video input or the fourth channel to watch the program, and watch according to the established broadcast time. With the rise of smart home appliances in recent years, all home appliances require functions such as being able to connect to the Internet and connecting to applications (APPs), including TVs. Smart TV provides the function of watching videos on the Internet through YouTube, Netflix, Apple TV+, myVideo and other platforms just like a smartphone. However, when a TV is currently playing a video, the sound is usually mixed and broadcast, so users often encounter the problem of being unable to distinguish which character the sound belongs to in the display screen.
鑒於上述,本發明提供一種紅外線遙控影音裝置及紅外線遙控影音播放方法,可以提供指定角色的聲音。In view of the above, the present invention provides an infrared remote control audio-visual device and an infrared remote-control audio-visual playing method, which can provide the voice of a specified character.
依據本發明一實施例的紅外線遙控影音裝置,包含紅外線感測器、顯示介面、音訊輸出介面、記憶體及處理器,其中處理器連接於紅外線感測器、顯示介面、音訊輸出介面及記憶體。紅外線感測器用於接收第一指示訊號及第二指示訊號。顯示介面用於播放影片的多個顯示畫面。音訊輸出介面用於輸出影片的音訊。記憶體儲存多個預處理角色特徵組、多個預處理音軌及所述多個預處理角色特徵組與所述多個預處理音軌的對應關係。處理器用於:依據第一指示訊號,暫停顯示介面的播放及音訊輸出介面的輸出,使顯示介面顯示暫停畫面;依據第二指示訊號,判斷位於暫停畫面的目標區塊中的目標角色圖案,其中第二指示訊號指示目標區塊,且目標角色圖案符合所述多個預處理角色特徵組中之一者;依據所述多個預處理角色特徵組與所述多個預處理音軌的對應關係,從音訊中提取對應於目標角色圖案的判定音軌;以及控制顯示介面繼續播放,且控制音訊輸出介面輸出判定音軌。According to an embodiment of the present invention, an infrared remote control audio-visual device includes an infrared sensor, a display interface, an audio output interface, a memory, and a processor, wherein the processor is connected to the infrared sensor, the display interface, the audio output interface, and the memory . The infrared sensor is used for receiving the first indication signal and the second indication signal. The display interface is used to play multiple display screens of the video. The audio output interface is used to output the audio of the video. The memory stores multiple pre-processing character feature sets, multiple pre-processing audio tracks, and correspondence between the multiple pre-processing character feature sets and the multiple pre-processing audio tracks. The processor is used to: according to the first instruction signal, pause the playback of the display interface and the output of the audio output interface, so that the display interface displays a pause image; according to the second instruction signal, determine the target character pattern located in the target block of the pause image, wherein The second indication signal indicates the target block, and the target character pattern conforms to one of the plurality of pre-processing character feature groups; according to the corresponding relationship between the plurality of pre-processing character feature groups and the plurality of pre-processing audio tracks , extracting the determination audio track corresponding to the target character pattern from the audio; and controlling the display interface to continue playing, and controlling the audio output interface to output the determination audio track.
依據本發明一實施例的紅外線遙控影音播放方法,包含:藉由顯示介面播放影片的多個顯示畫面,且藉由音訊輸出介面輸出影片的音訊;藉由紅外線感測器接收第一指示訊號;藉由處理器,依據第一指示訊號,暫停顯示介面的播放及音訊輸出介面的輸出,使顯示介面顯示暫停畫面;藉由紅外線感測器接收第二指示訊號;藉由處理器依據第二指示訊號,判斷位於暫停畫面的目標區塊中的目標角色圖案,其中第二指示訊號指示目標區塊,且目標角色圖案符合多個預處理角色特徵組中之一者;藉由處理器依據所述多個預處理角色特徵組與多個預處理音軌的對應關係,從音訊中提取對應於目標角色圖案的判定音軌;以及控制顯示介面繼續播放,且控制音訊輸出介面輸出判定音軌。According to an embodiment of the present invention, an infrared remote control video playback method includes: playing multiple display screens of a video through a display interface, and outputting the audio of the video through an audio output interface; receiving a first indication signal through an infrared sensor; The processor, according to the first instruction signal, pauses the playback of the display interface and the output of the audio output interface, so that the display interface displays a pause image; receives the second instruction signal through the infrared sensor; and uses the processor according to the second instruction signal, judging the target character pattern located in the target block of the pause screen, wherein the second indication signal indicates the target block, and the target character pattern conforms to one of a plurality of pre-processed character feature groups; by the processor according to the Correspondence between multiple pre-processed character feature groups and multiple pre-processed audio tracks, extracting a determination audio track corresponding to the target character pattern from the audio; and controlling the display interface to continue playing, and controlling the audio output interface to output the determination audio track.
藉由上述架構,本案所揭示的紅外線遙控影音裝置及紅外線遙控影音播放方法,基於多個預處理角色特徵組與多個預處理音軌的對應關係,判定紅外線指示訊號所指定的角色的對應音軌,可以提供單獨播放指定角色聲音的功能。With the above structure, the infrared remote control audio-visual device and the infrared remote control audio-visual playback method disclosed in this case determine the corresponding sound of the character specified by the infrared indication signal based on the correspondence between multiple pre-processed character feature groups and multiple pre-processed audio tracks. track, it can provide the function of playing the sound of the specified character independently.
以上之關於本揭露內容之說明及以下之實施方式之說明係用以示範與解釋本發明之精神與原理,並且提供本發明之專利申請範圍更進一步之解釋。The above description of the disclosure and the following description of the implementation are used to demonstrate and explain the spirit and principle of the present invention, and provide a further explanation of the patent application scope of the present invention.
以下在實施方式中詳細敘述本發明之詳細特徵組以及優點,其內容足以使任何熟習相關技藝者了解本發明之技術內容並據以實施,且根據本說明書所揭露之內容、申請專利範圍及圖式,任何熟習相關技藝者可輕易地理解本發明相關之目的及優點。以下之實施例係進一步詳細說明本發明之觀點,但非以任何觀點限制本發明之範疇。The detailed feature groups and advantages of the present invention are described in detail below in the embodiments, the content of which is sufficient to enable anyone familiar with the relevant art to understand the technical content of the present invention and implement it accordingly, and according to the content disclosed in this specification, the patent scope of the application and the drawings Any person skilled in the relevant art can easily understand the related objects and advantages of the present invention. The following examples are to further describe the concept of the present invention in detail, but not to limit the scope of the present invention in any way.
請參考圖1,圖1係依據本發明一實施例所繪示的紅外線遙控影音裝置的功能方塊圖。如圖1所示,紅外線遙控影音裝置10包含紅外線感測器11、顯示介面13、音訊輸出介面15、記憶體17及處理器19,其中處理器19透過有線或無線方式連接於紅外線感測器11、顯示介面13、音訊輸出介面15及記憶體17。特別來說,紅外線遙控影音裝置10可以智慧電視實現,但不以此為限。Please refer to FIG. 1 . FIG. 1 is a functional block diagram of an infrared remote control audio-visual device according to an embodiment of the present invention. As shown in FIG. 1 , the infrared remote control audio-
紅外線感測器11用於接收紅外線指示訊號,例如由電視遙控器所發出的紅外線指示訊號。進一步來說,紅外線指示訊號可以依電視遙控器遭觸發(例如按壓)之按鍵的不同而帶有不同代碼或其他可供處理器19辨識的標記。顯示介面13例如為螢幕,音訊輸出介面15則例如為喇叭。顯示介面13及音訊輸出介面15用於播放影片。進一步來說,顯示介面13用於播放影片的多個顯示畫面,音訊輸出介面15則用於輸出影片的音訊。The
記憶體17例如為快閃(flash)記憶體、硬碟(HDD)、固態硬碟(SSD)、動態隨機存取記憶體(DRAM)、靜態隨機存取記憶體(SRAM)或其他非揮發性記憶體。記憶體17可以為本地儲存媒介或可以為遠端儲存媒介,例如雲端資料庫。記憶體17儲存多個預處理角色特徵組、多個預處理音軌及所述多個預處理角色特徵組與所述多個預處理音軌的對應關係,其中所述對應關係例如係以查找表的形式儲存。
處理器19例如為中央處理器、微控制器、可程式化邏輯控制器或其他處理器。處理器19用於依據紅外線感測器11所接收的紅外線指示訊號來對顯示介面13及音訊輸出介面15所播放的影片進行處理,以播放指定角色所對應的聲音。進一步的執行步驟將於後描述。The
請一併參考圖1及2,其中圖2係依據本發明一實施例所繪示的紅外線遙控影音播放方法的流程圖。如圖2所示,紅外線遙控影音播放方法可以包含步驟S201~S207。圖2所示的紅外線遙控影音播放方法可由圖1所示的紅外線遙控影音裝置10執行,但不限於此。為了方便理解,以下示例性地以紅外線遙控影音裝置10的運作來說明紅外線遙控影音播放方法。Please refer to FIGS. 1 and 2 together, wherein FIG. 2 is a flow chart of an infrared remote control video playback method according to an embodiment of the present invention. As shown in FIG. 2 , the method for playing audio and video by infrared remote control may include steps S201 - S207 . The infrared remote control audio-visual playing method shown in FIG. 2 can be executed by the infrared remote control audio-
於步驟S201中,紅外線遙控影音裝置10藉由顯示介面13播放影片的多個顯示畫面,且藉由音訊輸出介面15輸出影片的音訊。於步驟S202及S203中,紅外線遙控影音裝置10藉由紅外線感測器11接收第一指示訊號,並藉由處理器19依據第一指示訊號,暫停顯示介面13的播放及音訊輸出介面15的輸出,使顯示介面13顯示暫停畫面,即暫停影片的播放。進一步來說,第一指示訊號可以源自於紅外線遙控器,紅外線遙控器具有多個按鍵,當特定按鍵或按鍵組合遭觸發時,紅外線遙控器便輸出第一指示訊號,且第一指示訊號帶有代表所述特定按鍵或按鍵組合的代碼(後稱暫停代碼)。舉例來說,特定按鍵可以為數字零的按鍵或暫停播放按鍵,本發明不予限制。處理器19預存有暫停代碼與暫停影片播放作業的對應關係。當處理器19接收到第一指示訊號時,便可依據第一指示訊號帶有的暫停代碼控制顯示介面13及音訊輸出介面15暫停播放影片。In step S201 , the infrared remote control audio-
於步驟S204中,紅外線遙控影音裝置10藉由紅外線感測器11接收第二指示訊號。於此要特別說明的是,步驟S204或可執行於步驟S202與步驟S203之間。舉例來說,使用者可以透過紅外線遙控器先後發送兩個指示訊號。處理器19可以依據紅外線感測器11接收到兩指示訊號的時間順序判斷先接收者為第一指示訊號而後接收者為第二指示訊號。In step S204 , the infrared remote control audio-
接著,紅外線遙控影音裝置10藉由處理器19執行步驟S205~S207。於步驟S205中,處理器19依據第二指示訊號,判斷位於暫停畫面的目標區塊中的目標角色圖案,其中第二指示訊號指示目標區塊,且目標角色圖案符合多個預處理角色特徵組中之一者。進一步來說,處理器19可以將暫停畫面分割為多個分割區塊,並分配每個分割區塊一指定代碼,每個區塊的指定代碼對應於紅外線遙控器上的不同按鍵或按鍵組合。由紅外線遙控器產生的第二指示訊號依其觸發的按鍵或按鍵組合,帶有對應的指定代碼。當處理器19接收到第二指示訊號時,便可判定其中的指定代碼所對應的分割區塊為目標區塊。特別來說,指定代碼所對應的按鍵或按鍵組合有別於前述暫停代碼所對應的按鍵或按鍵組合。Next, the infrared remote control audio-
處理器19依據儲存於記憶體17中的多個預處理角色特徵組判斷目標區塊中存在符合的目標角色圖案。更進一步來說,每個預處理角色特徵組包含多個特徵,例如五官特徵、臉型特徵等,當處理器19判斷目標區塊中包含相同於同一預處理角色特徵組中的預設數量的特徵時,判斷目標區塊中存在符合此預處理角色特徵組的目標角色圖案。特別來說,當處理器19判斷目標區塊不存在符合預處理角色特徵組的圖案時,便不作動,或是透過顯示介面13或音訊輸出介面15輸出索取訊號以索取另一第二指示訊號,並在接收到新的第二指示訊號後再次執行步驟S205。The
於圖2的步驟S206中,處理器19依據所述多個預處理角色特徵組與多個預處理音軌的對應關係,從音訊中提取對應於目標角色圖案的判定音軌。進一步來說,所述多個預處理角色特徵組分別包含影片中不同角色的五官特徵、臉型特徵等,所述多個預處理音軌分別包含影片中不同角色的聲音,而所述多個預處理角色特徵組與多個預處理音軌的對應關係指示屬於同個角色的預處理角色特徵組與多個預處理音軌的關聯性。預處理角色特徵組、預處理音軌及兩者間的對應關係可以由處理器19或外部處理器(例如雲端伺服器)在播放影片前以人工智慧(AI)技術處理而得並儲存至記憶體17,或由處理器19在接獲第一指示訊號前的影片播放期間以AI技術處理而得並儲存至記憶體17,進一步的AI處理方法將於後描述。於一實施態樣中,預處理音軌係對部分影片的音訊進行處理而得,處理器19可以依據所述多個預處理角色特徵組與所述多個預處理音軌的對應關係,判斷對應於目標角色圖案的預處理音軌,並依據預處理音軌的聲紋從音訊中提取具有相同聲紋的判定音軌。於另一實施態樣中,預處理音軌係對完整影片的音訊進行處理而得,處理器19可以依據所述多個預處理角色特徵組與所述多個預處理音軌的對應關係,判斷對應於目標角色圖案的預處理音軌並將此預處理音軌作為判定音軌。In step S206 of FIG. 2 , the
於步驟S207中,處理器19控制顯示介面13繼續播放,且控制音訊輸出介面15輸出判定音軌。於一實施態樣中,處理器19可以控制音訊輸出介面15僅輸出判定音軌而不輸出音訊中的其他音軌。於另一實施態樣中,處理器19可以控制音訊輸出介面15以高於其他音軌的音量輸出判定音軌。In step S207, the
舉一個實際的例子來說明步驟S205~S207,請參考圖3及圖4,圖3係依據本發明一實施例所繪示的紅外線遙控影音裝置的使用情境圖,圖4則係依據本發明一實施例所繪示的紅外線遙控影音裝置的暫停畫面示意圖。如圖3所示,作為顯示介面的電視螢幕的暫停畫面F3可以被分割為九個分割區塊F31,這九個分割區塊F31可以分別被分配指定代碼1~9,分別對應於電視遙控器上的數字按鍵1~9。當數字按鍵1遭觸發而產生第二指示訊號至電視時,電視的處理器便會據以判斷目標區塊為具有指定代碼1的分割區塊F31,以此類推。To give a practical example to illustrate steps S205-S207, please refer to FIG. 3 and FIG. 4. FIG. A schematic diagram of the pause screen of the infrared remote control audio-visual device shown in the embodiment. As shown in FIG. 3 , the pause screen F3 of the TV screen as the display interface can be divided into nine divided blocks F31, and these nine divided blocks F31 can be assigned designated
假設圖4所示的暫停畫面F4具有如圖3所示的指定代碼配置。當第二指示訊號指示指定代碼4時,處理器依據記憶體中的預處理角色特徵組判斷目標區塊中存在符合的目標角色圖案41,再依據預處理角色特徵組與預處理音軌的對應關係,從音訊中提取對應於目標角色圖案41的判定音軌,並控制顯示介面繼續播放且控制音訊輸出介面輸出判定音軌。當第二指示訊號指示指定代碼5時,同理於上述,處理器判斷目標角色圖案42,從音訊中提取對應於目標角色圖案42,並控制顯示介面繼續播放且控制音訊輸出介面輸出判定音軌。特別來說,圖4所示之分割暫停畫面F4的虛線可以選擇性地呈現於實際顯示的畫面上。舉例來說,處理器在控制顯示介面暫停播放時可以同時控制顯示介面呈現分割暫停畫面F4的虛線,以方便使用者選擇目標區塊。Assume that the pause screen F4 shown in FIG. 4 has the specified code configuration as shown in FIG. 3 . When the second indication signal indicates the specified
圖4示例性地繪示目標區塊中存在單個目標角色圖案41或42的實施例。於目標區塊中存在多個目標角色圖案的一實施例中,處理器可以依據預處理角色特徵組與預處理音軌的對應關係,從音訊中提取分別對應於多個目標角色圖案的多個判定音軌,控制顯示介面繼續播放且控制音訊輸出介面輸出所述多個判定音軌。FIG. 4 exemplarily illustrates an embodiment in which a single
對於目標區塊中存在多個目標角色圖案的另一實施例,請一併參考圖1、圖2及圖5,其中圖5係依據該實施例所繪示紅外線遙控影音播放方法的流程圖。圖5所示的紅外線遙控影音播放方法包含前述圖2實施例所述的步驟S201~S204、S206及S207,差別在於步驟S205包含了步驟S501~S505。步驟S501~S505同樣可由圖1所示的紅外線遙控影音裝置10執行,但不限於此。為了方便理解,以下示例性地以紅外線遙控影音裝置10的運作來說明步驟S501~S505。For another embodiment in which there are multiple target character patterns in the target block, please refer to FIG. 1 , FIG. 2 and FIG. 5 , wherein FIG. 5 is a flow chart of an infrared remote control video playback method according to this embodiment. The infrared remote control video playback method shown in FIG. 5 includes steps S201-S204, S206, and S207 described in the embodiment of FIG. 2, the difference is that step S205 includes steps S501-S505. Steps S501-S505 can also be performed by the infrared remote control audio-
於步驟S501中,處理器19判斷位於暫停畫面的目標區塊(對應於第二指示訊號)中存在符合預處理角色特徵組的預選圖案數量是否為一個。當判斷結果為是時,如步驟S502所示,處理器19將預選圖案作為目標角色圖案,並接著執行步驟S206。當判斷結果為否(即預選圖案數量為多個)時,如步驟S503所示,處理器19透過顯示介面13或/及音訊輸出介面15索取第三指示訊號。舉例來說,處理器19可以控制顯示介面13將目標區塊放大以填滿畫面,且/或顯示索取第三指示訊號的文字訊息。舉另個例子來說,處理器19可以控制音訊輸出介面15輸出索取第三指示訊號的語音訊息。In step S501, the
於步驟S504及步驟S505中,處理器19透過紅外線感測器11取得第三指示訊號,且依據第三指示訊號,判斷所述多個預選圖案中位於目標區塊的目標子區塊中的預選圖案為目標角色圖案,其中第三指示訊號指示目標子區塊。進一步來說,處理器19可以將目標區塊分隔為多個分割子區塊,並分配每個分割區塊一指定代碼,每個分割子區塊的指定代碼對應於紅外線遙控器上的不同按鍵或按鍵組合。由紅外線遙控器產生的第三指示訊號依其觸發的按鍵或按鍵組合,帶有對應的指定代碼。當處理器19接收到第三指示訊號時,便可判定其中的指定代碼所對應的分割子區塊為目標子區塊。處理器19將位於目標子區塊中的預選圖案作為目標角色圖案,並接著進行步驟S206。特別來說,當處理器19判斷目標子區塊不存在預選圖案時,便不作動,或是透過顯示介面13或音訊輸出介面15輸出索取訊號以索取另一第三指示訊號,並在接收到新的第三指示訊號後再次執行步驟S505。In step S504 and step S505, the
舉一個實際的例子來說明步驟S501~S505,請參考圖6A及6B,圖6A及6B係依據本發明另一實施例所繪示的紅外線遙控影音裝置的暫停畫面示意圖。假設圖6A所示的暫停畫面F6具有如圖3所示的指定代碼配置。當第二指示訊號指示指定代碼6時,處理器判斷目標區塊中存在符合預處理角色特徵組的預選圖案61的數量為一個,並將預選圖案61作為目標角色圖案。當第二指示訊號指示指定代碼4時,處理器判斷目標區塊中存在符合預處理角色特徵組的預選圖案62及63的數量為多個,接著控制顯示介面將目標區塊放大以填滿畫面,如圖6B所示。假設圖6B所示的目標區塊F61具有如圖3所示的指定代碼配置。當第三指示訊號指定代碼6時,處理器判斷預選圖案63為目標角色圖案。當第三指示訊號指示指定代碼5時,處理器判斷預選圖案62為目標角色圖案。To give a practical example to illustrate steps S501 - S505 , please refer to FIGS. 6A and 6B . FIGS. 6A and 6B are schematic diagrams of pause screens of an infrared remote control audio-visual device according to another embodiment of the present invention. Assume that the pause screen F6 shown in FIG. 6A has the specified code configuration as shown in FIG. 3 . When the second indication signal indicates the
特別來說,圖6A/6B所示之分割暫停畫面F6/目標區塊F61的虛線可以選擇性地呈現於實際顯示的畫面上。舉例來說,處理器在控制顯示介面暫停播放時可以同時控制顯示介面呈現分割暫停畫面F6的虛線,以方便使用者選擇目標區塊,且/或在控制顯示介面放大目標區塊F61時可以同時控制顯示介面呈現分割目標區塊F61的虛線,以方便使用者選擇目標子區塊。In particular, the dotted line dividing the pause frame F6/target block F61 shown in FIG. 6A/6B can be selectively presented on the actually displayed frame. For example, when the processor controls the display interface to pause playback, it can simultaneously control the display interface to present the dotted line that divides the pause screen F6, so as to facilitate the user to select the target block, and/or can simultaneously control the display interface to enlarge the target block F61 The control display interface presents dotted lines dividing the target block F61 to facilitate the user to select the target sub-block.
如前所述,紅外線遙控影音裝置的記憶體中所儲存的預處理角色特徵組、預處理音軌及兩者間的對應關係可以由裝置內部處理器或外部處理器(例如雲端伺服器)在播放影片前處理,或由處理器在接獲第一指示訊號前的影片播放期間處理而得。上述資料的處理流程請參考圖7,圖7係依據本發明一實施例所繪示的紅外線遙控影音播放方法的預處理流程圖。如圖7所示,紅外線遙控影音播放方法的預處理流程涉及AI技術,可以包含步驟S701~S704。As mentioned above, the pre-processed character feature group, pre-processed audio track and the corresponding relationship between them stored in the memory of the infrared remote control audio-visual device can be controlled by the internal processor of the device or an external processor (such as a cloud server). Processed before playing the video, or processed by the processor during video playback before receiving the first instruction signal. Please refer to FIG. 7 for the processing flow of the above data. FIG. 7 is a preprocessing flow chart of an infrared remote control video playback method according to an embodiment of the present invention. As shown in FIG. 7 , the preprocessing flow of the infrared remote control video and audio playback method involves AI technology and may include steps S701 - S704 .
於步驟S701中,處理器對影片的多個顯示畫面執行多目標追蹤以取得多個角色各自在所述多個顯示畫面中所對應的多個特徵區塊。此處所述的多個顯示畫面可以為影片的所有顯示畫面,或是處理器在接獲第一指示訊號前影片已播放的顯示畫面。進一步來說,處理器所執行之多目標追蹤可以包含:調整顯示畫面大小;將調整後的顯示畫面輸入預先訓練好的物件偵測模型(例如Yolov3或其他可偵測人物的偵測模型),以產生多個偵測框;將所述多個偵測框輸入追蹤器處理,以取得多個角色的追蹤結果,即各角色在各顯示畫面中的特徵區塊。其中,追蹤器可以對輸入資料執行多目標追蹤演算法,例如SORT(Simple Online and Real-time Tracking)。In step S701 , the processor executes multi-object tracking on multiple display frames of the video to obtain multiple feature blocks corresponding to each of the multiple characters in the multiple display frames. The plurality of display frames mentioned here may be all display frames of the video, or the display frames of the video that have been played before the processor receives the first instruction signal. Further, the multi-target tracking performed by the processor may include: adjusting the size of the display screen; inputting the adjusted display screen into a pre-trained object detection model (such as Yolov3 or other detection models that can detect people), to generate multiple detection frames; input the multiple detection frames into the tracker for processing, so as to obtain the tracking results of multiple characters, that is, the feature blocks of each character in each display screen. Among them, the tracker can perform multi-target tracking algorithms on the input data, such as SORT (Simple Online and Real-time Tracking).
於步驟S702中,處理器依據角色各自對應的特徵區塊,取得多個外觀特徵組以作為所述多個預處理角色特徵組。進一步來說,處理器對於各個角色,可以從其追蹤結果取得五官特徵、臉型特徵等外觀特徵,組合為外觀特徵組,以作為預處理角色特徵組。In step S702, the processor obtains a plurality of appearance feature groups according to the corresponding feature blocks of the characters as the plurality of pre-processed character feature groups. Furthermore, for each character, the processor can obtain appearance features such as facial features and facial features from the tracking results, and combine them into an appearance feature group as a pre-processed character feature group.
於步驟S703中,處理器將音訊分離為具有不同聲紋的多個預處理音軌。進一步來說,處理器可以藉由預先訓練好的聲源分離模型將音訊分離為具有不同聲紋的多個預處理音軌。所述聲源分離模型可以係以AI智慧神經網路人聲辨識演算法訓練而成的模型,其中所述AI智慧神經網路人聲辨識演算法例如為SORT。於此要特別說明的是,處理器對於影片畫面的預處理以及對於影片音訊的預處理可以分別或同時執行。除了如圖7所示地執行於步驟S702之後,步驟S703可以執行於步驟S701之前,或可以執行於步驟S701與S702之間,或可以與步驟S701或S702同時執行。In step S703, the processor separates the audio into multiple pre-processed tracks with different voiceprints. Furthermore, the processor can separate the audio into multiple pre-processed tracks with different voiceprints by using a pre-trained sound source separation model. The sound source separation model may be a model trained with an AI intelligent neural network human voice recognition algorithm, wherein the AI intelligent neural network human voice recognition algorithm is, for example, SORT. It should be noted here that the preprocessing of the video image and the video audio by the processor can be performed separately or simultaneously. In addition to being performed after step S702 as shown in FIG. 7 , step S703 may be performed before step S701 , or may be performed between steps S701 and S702 , or may be performed simultaneously with step S701 or S702 .
於步驟S704中,處理器依據所述多個顯示畫面、所述多個預處理角色特徵組及所述多個預處理音軌,建立所述多個預處理角色特徵組與所述多個預處理音軌的對應關係。進一步來說,此步驟可以包含將每一預處理音軌作為目標音軌,執行:對所述多個顯示畫面中對應於目標音軌具有訊號之期間的多個目標畫面中的特徵區塊執行臉部動作偵測;以及依據臉部動作偵測結果,判斷目標音軌對應於所述多個預處理角色特徵組中之一者。簡而言之,處理器可以判斷在目標音軌有訊號時,顯示畫面中嘴部有開合動作的角色,並建立目標音軌與該角色的預處理角色特徵組的對應關係。特別來說,處理器會預先判斷目標音軌具有訊號之期間,再執行顯示畫面中嘴部有開合動作之角色的判斷,以避免目標音軌有訊號之期間與畫面中嘴部有開合動作的角色之聲音與動作無法對應,角色動作有延遲的現象出現之問題。In step S704, the processor establishes the multiple pre-processing character feature groups and the multiple pre-processing character feature groups and the multiple pre-processing character feature groups according to the multiple display screens, the multiple pre-processing character feature Handle the correspondence between audio tracks. Further, this step may include taking each pre-processed audio track as a target audio track, and executing: performing on the feature blocks in the plurality of target images corresponding to the period when the target audio track has a signal in the plurality of display images Facial motion detection; and according to the facial motion detection result, judging that the target audio track corresponds to one of the plurality of pre-processed character feature groups. In short, the processor can determine the character whose mouth is opening and closing in the display screen when the target audio track has a signal, and establishes the correspondence between the target audio track and the pre-processed character feature set of the character. In particular, the processor will pre-judge the period when the target audio track has a signal, and then execute the judgment of the character whose mouth is opening and closing in the display screen, so as to avoid the period when the target audio track has a signal and the opening and closing of the mouth in the screen. The voice of the moving character does not correspond to the movement, and there is a problem that the movement of the character is delayed.
藉由上述架構,本案所揭示的紅外線遙控影音裝置及紅外線遙控影音播放方法,基於多個預處理角色特徵組與多個預處理音軌的對應關係,判定紅外線指示訊號所指定的角色的對應音軌,可以提供單獨播放指定角色聲音的功能。With the above structure, the infrared remote control audio-visual device and the infrared remote control audio-visual playback method disclosed in this case determine the corresponding sound of the character specified by the infrared indication signal based on the correspondence between multiple pre-processed character feature groups and multiple pre-processed audio tracks. track, it can provide the function of playing the sound of the specified character independently.
雖然本發明以前述之實施例揭露如上,然其並非用以限定本發明。在不脫離本發明之精神和範圍內,所為之更動與潤飾,均屬本發明之專利保護範圍。關於本發明所界定之保護範圍請參考所附之申請專利範圍。Although the present invention is disclosed by the aforementioned embodiments, they are not intended to limit the present invention. Without departing from the spirit and scope of the present invention, all changes and modifications are within the scope of patent protection of the present invention. For the scope of protection defined by the present invention, please refer to the appended scope of patent application.
10:紅外線遙控影音裝置
11:紅外線感測器
13:顯示介面
15:音訊輸出介面
17:記憶體
19:處理器
1~9:指定代碼
F3、F4、F6:暫停畫面
F31:分割區塊
F61:目標區塊
41、42:目標角色圖案
61、62、63:預選圖案
10: Infrared remote control audio and video device
11: Infrared sensor
13: Display interface
15: Audio output interface
17: Memory
19:
圖1係依據本發明一實施例所繪示的紅外線遙控影音裝置的功能方塊圖。 圖2係依據本發明一實施例所繪示的紅外線遙控影音播放方法的流程圖。 圖3係依據本發明一實施例所繪示的紅外線遙控影音裝置的使用情境圖。 圖4係依據本發明一實施例所繪示的紅外線遙控影音裝置的暫停畫面示意圖。 圖5係依據本發明另一實施例所繪示紅外線遙控影音播放方法的流程圖。 圖6A及6B係依據本發明另一實施例所繪示的紅外線遙控影音裝置的暫停畫面示意圖。 圖7係依據本發明一實施例所繪示的紅外線遙控影音播放方法的預處理流程圖。 FIG. 1 is a functional block diagram of an infrared remote control audio-visual device according to an embodiment of the present invention. FIG. 2 is a flow chart of an infrared remote control audio-visual playing method according to an embodiment of the present invention. FIG. 3 is a diagram of the usage situation of the infrared remote control audio-visual device according to an embodiment of the present invention. FIG. 4 is a schematic diagram of a pause screen of an infrared remote control audio-visual device according to an embodiment of the present invention. FIG. 5 is a flow chart of an infrared remote control video and audio playback method according to another embodiment of the present invention. 6A and 6B are schematic diagrams of a pause screen of an infrared remote control audio-visual device according to another embodiment of the present invention. FIG. 7 is a preprocessing flowchart of an infrared remote control video and audio playback method according to an embodiment of the present invention.
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
TW110134476A TW202315414A (en) | 2021-09-15 | 2021-09-15 | Infrared remote controlled video and audio device and infrared remote control method of playing video and audio |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
TW110134476A TW202315414A (en) | 2021-09-15 | 2021-09-15 | Infrared remote controlled video and audio device and infrared remote control method of playing video and audio |
Publications (1)
Publication Number | Publication Date |
---|---|
TW202315414A true TW202315414A (en) | 2023-04-01 |
Family
ID=86943048
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
TW110134476A TW202315414A (en) | 2021-09-15 | 2021-09-15 | Infrared remote controlled video and audio device and infrared remote control method of playing video and audio |
Country Status (1)
Country | Link |
---|---|
TW (1) | TW202315414A (en) |
-
2021
- 2021-09-15 TW TW110134476A patent/TW202315414A/en unknown
Similar Documents
Publication | Publication Date | Title |
---|---|---|
TWI778477B (en) | Interaction methods, apparatuses thereof, electronic devices and computer readable storage media | |
US20210249012A1 (en) | Systems and methods for operating an output device | |
JP7431291B2 (en) | System and method for domain adaptation in neural networks using domain classifiers | |
US10015385B2 (en) | Enhancing video conferences | |
WO2021109673A1 (en) | Audio and video quality enhancement method and system employing scene recognition, and display device | |
KR20190093722A (en) | Electronic apparatus, method for controlling thereof, and computer program product thereof | |
US11762905B2 (en) | Video quality evaluation method and apparatus, device, and storage medium | |
US11503375B2 (en) | Systems and methods for displaying subjects of a video portion of content | |
US20150271570A1 (en) | Audio/video system with interest-based ad selection and methods for use therewith | |
JP2011164681A (en) | Device, method and program for inputting character and computer-readable recording medium recording the same | |
US11405675B1 (en) | Infrared remote control audiovisual device and playback method thereof | |
KR20190097687A (en) | Electronic device and Method for generating summary image of electronic device | |
TW202315414A (en) | Infrared remote controlled video and audio device and infrared remote control method of playing video and audio | |
US10798459B2 (en) | Audio/video system with social media generation and methods for use therewith | |
US20150271553A1 (en) | Audio/video system with user interest processing and methods for use therewith | |
CN115811590A (en) | Mobile audio/video device and audio/video playing control method | |
US11099811B2 (en) | Systems and methods for displaying subjects of an audio portion of content and displaying autocomplete suggestions for a search related to a subject of the audio portion | |
CN114727119A (en) | Live broadcast and microphone connection control method and device and storage medium | |
US20210089781A1 (en) | Systems and methods for displaying subjects of a video portion of content and displaying autocomplete suggestions for a search related to a subject of the video portion | |
US20200204856A1 (en) | Systems and methods for displaying subjects of an audio portion of content | |
TWI777771B (en) | Mobile video and audio device and control method of playing video and audio | |
CA3104211A1 (en) | Systems and methods for displaying subjects of a portion of content | |
CN115866312A (en) | Server and subtitle position setting method | |
CN115859970A (en) | Server and subtitle generating method | |
CN108427548A (en) | User interaction approach, device, equipment based on microphone and storage medium |