TW202107250A - Communication using interactive avatars - Google Patents
Communication using interactive avatars Download PDFInfo
- Publication number
- TW202107250A TW202107250A TW109121460A TW109121460A TW202107250A TW 202107250 A TW202107250 A TW 202107250A TW 109121460 A TW109121460 A TW 109121460A TW 109121460 A TW109121460 A TW 109121460A TW 202107250 A TW202107250 A TW 202107250A
- Authority
- TW
- Taiwan
- Prior art keywords
- avatar
- user
- animation
- remote
- user input
- Prior art date
Links
Images
Landscapes
- User Interface Of Digital Computer (AREA)
Abstract
Description
以下揭示案係有關於視訊通訊,且更特定言之,係有關於使用互動化身的視訊通訊。 The following disclosures are related to video communications, and more specifically, to video communications using interactive avatars.
行動裝置中可利用的功能性種類之增加使得使用者產生對除單純的呼叫之外經由視訊進行通訊的渴望。例如,使用者可起始「視訊呼叫」、「視訊會議」等等,其中裝置中之攝影機及麥克風捕獲使用者之音訊及視訊,該音訊及視訊實時傳輸至一或多個其他接收者,諸如其他行動裝置、桌上型電腦、視訊會議系統等等。視訊之通訊可涉及實質量之資料的傳輸(例如,取決於攝影機之技術、用來處理經擷取影像資料之特定視訊編碼解碼器,等等)。考慮到現存2G/3G無線技術之帶寬限制及新興4G無線技術之仍有限的帶寬,許多裝置使用者進行的並行視訊呼叫可超出在現存無線通訊基礎架構中可利用的帶寬,從而可負面地影響視訊呼叫之品質。 The increase in the types of functionalities available in mobile devices makes users want to communicate via video in addition to simple calls. For example, the user can initiate a "video call", "video conference", etc., where the camera and microphone in the device capture the user's audio and video, and the audio and video are transmitted to one or more other receivers in real time, such as Other mobile devices, desktop computers, video conferencing systems, etc. Video communication may involve the transmission of real-quality data (for example, depending on the technology of the camera, the specific video codec used to process the captured image data, etc.). Taking into account the bandwidth limitations of existing 2G/3G wireless technologies and the still limited bandwidth of emerging 4G wireless technologies, the parallel video calls made by many device users can exceed the bandwidth available in the existing wireless communication infrastructure, which can negatively affect The quality of the video call.
依據本發明之一實施例,係特地提出一種系統,其包含:一使用者輸入裝置,其組配來擷取一使用者輸入;一通訊模組,其組配來傳輸及接收資訊;以及一或多個儲存媒體,該儲存媒體上單獨地或組合地儲存有指令,當藉由一或多個處理器執行該等指令時導致包含以下之操作:選擇一化身;起始通訊;偵測一使用者輸入;識別該使用者輸入;基於該使用者輸入識別一動畫命令;產生化身參數;以及傳輸該動畫命令及該等化身參數中之至少一者。 According to an embodiment of the present invention, a system is specifically proposed, which includes: a user input device configured to capture a user input; a communication module configured to transmit and receive information; and a Or a plurality of storage media on which instructions are stored individually or in combination, when executed by one or more processors, the instructions include the following operations: select an avatar; initiate communication; detect one User input; identifying the user input; identifying an animation command based on the user input; generating avatar parameters; and transmitting the animation command and at least one of the avatar parameters.
100:裝置至裝置系統/系統 100: device-to-device system/system
102、112、102’:裝置/遠程裝置 102, 112, 102’: Device/Remote Device
104、114、104’/114’:攝影機 104, 114, 104’/114’: camera
106、116、106’、116’:麥克風 106, 116, 106’, 116’: microphone
107、117:揚聲器 107, 117: Speaker
108、118:觸摸感應顯示器/顯示器 108, 118: Touch-sensitive display/display
108’、118’:顯示器 108’, 118’: Display
110、120:化身 110, 120: Avatar
112’:裝置/遠程裝置 112’: Device/Remote Device
122:網路 122: Network
124、124’:伺服器 124, 124’: server
126:系統 126: System
128:虛擬空間 128: virtual space
200:攝影機、音訊及觸控螢幕框架模組 200: Camera, audio and touch screen frame module
202:面部偵測及追蹤模組/面部偵測/追蹤模組/臉部偵測模組 202: Face detection and tracking module/face detection/tracking module/face detection module
204:特徵提取模組 204: Feature Extraction Module
206:音訊轉換模組 206: Audio Conversion Module
208:觸摸偵測模組 208: Touch detection module
210:手勢偵測模組 210: Gesture Detection Module
212:化身選擇模組 212: Avatar Selection Module
214:化身控制模組 214: Avatar Control Module
216:系統 216: System
218:回饋化身 218: Giving Back Incarnation
220:通訊模組 220: Communication module
222:處理器 222: processor
300、304:WiFi連接 300, 304: WiFi connection
302:網際網路 302: Internet
306:企業AP 306: Enterprise AP
308:閘道 308: Gateway
310:防火牆 310: firewall
312:媒體及信號路徑 312: Media and signal path
314:家AP 314: Home AP
400:流程圖 400: flow chart
402~428:操作 402~428: Operation
所請求標的之各種實施例的特徵及優點將以下隨詳細說明之進行並於參閱圖式之後而變得明顯,圖示中相同數字指定相同部分,且其中: The features and advantages of the various embodiments of the requested subject matter will become apparent in the following detailed description and after referring to the drawings. The same numbers in the figures designate the same parts, and among them:
圖1A例示根據本揭示案之各種實施例的示例裝置至裝置系統; Figure 1A illustrates an example device-to-device system according to various embodiments of the present disclosure;
圖1B例示根據本揭示案之各種實施例的示例虛擬空間系統; FIG. 1B illustrates an example virtual space system according to various embodiments of the present disclosure;
圖2例示根據本揭示案之各種實施例的示例裝置; Figure 2 illustrates an example device according to various embodiments of the present disclosure;
圖3例示根據本揭示案之至少一實施例的示例系統實行方案;以及 Figure 3 illustrates an exemplary system implementation scheme according to at least one embodiment of the present disclosure; and
圖4為根據本揭示案之至少一實施例的示例操作的流程圖。 FIG. 4 is a flowchart of example operations according to at least one embodiment of the present disclosure.
雖然以下詳細說明係參考說明性實施例來進行,但是熟習此項技術者將明白該等實施例之許多替代例、修改形式及變化形式。 Although the following detailed description is made with reference to illustrative embodiments, those skilled in the art will understand many alternatives, modifications, and variations of these embodiments.
通常,本揭示案描述使用互動化身來視訊通訊的系統及方法。與活動影像相對,使用化身大體上減少將要傳輸之資料量,且因此化身通訊需要較少的帶寬。互動化身係組配來藉由基於使用者輸入改變所選化身之顯示而增強使用者體驗。此外,使用者語音可獲擷取及轉換來產生化身語音。化身語音可隨後與使用者語音有關,但可遮掩使用者之身份。音訊轉換可包括例如音調偏移及/或時間延長。 Generally, this disclosure describes a system and method for video communication using interactive avatars. Compared with moving images, the use of avatars generally reduces the amount of data to be transmitted, and therefore avatar communication requires less bandwidth. The interactive avatar is configured to enhance the user experience by changing the display of the selected avatar based on user input. In addition, the user's voice can be captured and converted to generate an avatar voice. The avatar voice can then be related to the user's voice, but can obscure the user's identity. Audio conversion may include, for example, pitch shift and/or time extension.
在一實施例中,啟動耦接至攝影機、麥克風及揚聲器之裝置中的應用程式。該應用程式可組配來允許使用者選擇用於顯示於遠程裝置上、虛擬空間中等等之化身。裝置可隨後組配來起始與至少一其他裝置、虛擬空間等等的通訊。例如,通訊可經由2G、3G、4G蜂巢式連接來建立。或者或另外,通訊可經由網際網路,經由WiFi連接來建立。通訊建立之後,攝影機可組配來開始擷取影像及/或離物體的距離,且麥克風可組配來開始擷取聲音,例如使用者語音,且將使用者語音轉化成使用者語音信號。 In one embodiment, the application program in the device coupled to the camera, microphone, and speaker is activated. The application can be configured to allow the user to choose an avatar for display on a remote device, in a virtual space, etc. The device can then be configured to initiate communication with at least one other device, virtual space, etc. For example, communication can be established via 2G, 3G, 4G cellular connections. Alternatively or additionally, communication can be established via the Internet, via a WiFi connection. After the communication is established, the camera can be configured to start capturing images and/or the distance from the object, and the microphone can be configured to start capturing sounds, such as user voice, and convert the user voice into a user voice signal.
隨後可判定是否偵測到使用者輸入。使用者輸入可藉由使用者輸入裝置擷取。使用者輸入包括藉由觸摸感應顯示器所擷取之觸摸事件及藉由攝影機所擷取的手勢,該攝影機例如組配來擷取離物體之距離的深度攝影機,及/或web攝影機。因此,使用者輸入裝置包括觸摸感應顯示 器及/或攝影機。若偵測到使用者輸入,則可識別使用者輸入。對於觸摸事件,使用者輸入識別符可與觸摸類型及一或多個觸摸位置有關。對於手勢(例如張開手)而言,使用者輸入識別符可與手勢識別符有關。動畫命令可隨後基於使用者輸入來識別。動畫命令相應於與使用者輸入相關聯的所要響應,例如響應於所顯示化身之外觀上的單次輕觸而改變所顯示化身外觀的顏色。 It can then be determined whether user input is detected. User input can be captured by the user input device. User input includes touch events captured by a touch-sensitive display and gestures captured by a camera, such as a depth camera configured to capture the distance from an object, and/or a web camera. Therefore, the user input device includes a touch sensitive display And/or camera. If the user input is detected, the user input can be recognized. For touch events, the user input identifier may be related to the touch type and one or more touch positions. For gestures (such as an open hand), the user input identifier may be related to the gesture identifier. Animation commands can then be identified based on user input. The animation command corresponds to the desired response associated with the user input, such as changing the color of the displayed avatar's appearance in response to a single tap on the displayed avatar's appearance.
隨後可產生化身參數。化身參數可基於面部偵測、頭部移動及/或動畫命令來產生。化身參數可因此包括基於例如面部偵測及頭部移動的被動組件,及基於動畫命令的互動組件。化身參數可用於使化身於至少一其他裝置上、於虛擬空間內等等成動畫。在一實施例中,化身參數可基於面部偵測、頭部移動及動畫命令來產生。在該實施例中,所得動畫包括基於面部偵測及頭部移動的被動動畫,其藉由基於動畫命令的互動動畫來修改。因此,化身動畫可包括基於例如面部偵測及頭部移動的被動動畫,及基於使用者輸入的互動動畫。 The avatar parameters can then be generated. Avatar parameters can be generated based on face detection, head movement, and/or animation commands. Avatar parameters can therefore include passive components based on, for example, face detection and head movement, and interactive components based on animation commands. The avatar parameters can be used to animate the avatar on at least one other device, in a virtual space, and so on. In one embodiment, the avatar parameters can be generated based on face detection, head movement, and animation commands. In this embodiment, the resulting animation includes a passive animation based on face detection and head movement, which is modified by an interactive animation based on animation commands. Therefore, the avatar animation may include passive animation based on, for example, face detection and head movement, and interactive animation based on user input.
可隨後傳輸動畫命令及化身參數中之至少一者。在一實施例中,接收遠程動畫命令及遠程化身參數中之至少一者。該遠程動畫命令可使裝置基於遠程動畫命令來判定化身參數以便使所顯示化身成動畫。遠程化身參數可使裝置基於所接收之遠程化身參數來使所顯示化身成動畫。 At least one of the animation command and the avatar parameter can be transmitted subsequently. In an embodiment, at least one of a remote animation command and a remote avatar parameter is received. The remote animation command enables the device to determine the avatar parameters based on the remote animation command so as to animate the displayed avatar. The remote avatar parameters enable the device to animate the displayed avatar based on the received remote avatar parameters.
音訊通訊可伴隨化身動畫。通訊建立之後,麥克 風可組配來擷取音訊輸入(聲音),例如使用者語音,且將所擷取的聲音轉化成相應音訊信號(例如使用者語音信號)。在一實施例中,使用者語音信號可轉換成化身語音信號,其可隨後獲編碼及傳輸。所接收之化身語音信號可隨後藉由揚聲器轉化回聲音(例如化身語音)。化身語音可因此基於使用者語音且可保存內容但可改變與所擷取語音相關聯之頻譜資料。例如,轉換包括但不限於音調偏移時間延長及/或轉化回放率。 Audio communication can be accompanied by avatar animation. After the communication was established, Mike The wind can be configured to capture audio input (sound), such as user voice, and convert the captured voice into a corresponding audio signal (such as user voice signal). In one embodiment, the user's voice signal can be converted into an avatar voice signal, which can then be encoded and transmitted. The received avatar voice signal can then be converted back to sound (such as avatar voice) by the speaker. The avatar voice can therefore be based on the user's voice and the content can be saved but the spectrum data associated with the captured voice can be changed. For example, conversion includes, but is not limited to, pitch shift time extension and/or conversion playback rate.
使用者輸入裝置(例如觸摸感應顯示器及/或攝影機)可組配來擷取使用者輸入,該等使用者輸入係組配來基於至少一其他裝置上之使用者輸入來使化身成動畫。使用者驅動之動畫(基於動畫命令)可另外為基於面部表情及/或頭部移動之動畫。動畫命令可包括但不限於化身顯示之定位改變、面部特徵扭曲、改變特徵來傳達情緒等等。動畫命令可因此修改與基於面部偵測/追蹤之動畫類似的化身動畫及/或除基於面部偵測/追蹤之動畫之外修改化身動畫。動畫命令可產生時間有限之動畫且可基於來自遠程使用者的輸入而使所得動畫例示於本地使用者之所顯示化身上。 A user input device (such as a touch-sensitive display and/or a camera) can be configured to capture user input, and the user input is configured to animate the avatar based on user input on at least one other device. User-driven animations (based on animation commands) can additionally be animations based on facial expressions and/or head movement. Animation commands may include, but are not limited to, changes in the positioning of the avatar display, distortion of facial features, changing features to convey emotions, and so on. The animation command can therefore modify the avatar animation similar to the animation based on face detection/tracking and/or modify the avatar animation in addition to the animation based on face detection/tracking. The animation command can generate a time-limited animation and can instantiate the resulting animation on the displayed avatar of the local user based on the input from the remote user.
因此,有限帶寬視訊通訊系統可使用化身來實行。音訊可加以轉換且視訊可基於所偵測之使用者輸入及所識別之動畫命令而成動畫,從而利用化身通訊來增強使用者體驗。此外,可使用化身保存匿名性,包括如本文所述之音訊轉換。 Therefore, limited bandwidth video communication systems can be implemented using avatars. The audio can be converted and the video can be animated based on the detected user input and the recognized animation command, thereby using avatar communication to enhance the user experience. In addition, avatars can be used to preserve anonymity, including audio conversion as described in this article.
圖1A例示與本揭示案之各種實施例一致的裝置至裝置系統100。系統100可通常包括經由網路122通訊之裝置102及112。裝置102至少包括攝影機104、麥克風106、揚聲器107及觸摸感應顯示器108。裝置112至少包括攝影機114、麥克風116、揚聲器117及觸摸感應顯示器118。網路122至少包括伺服器124。
FIG. 1A illustrates a device-to-
裝置102及112可包括能夠有線通訊及/或無線通訊之各種硬體平台。例如,裝置102及112可包括但不限於視訊會議系統、桌上型電腦、膝上型電腦、平板電腦、智慧型電話(例如,iPhones®、基於Android的電話、Blackberries®、基於Symbian®的電話、基於Palm®的電話等等)、蜂巢式手機等等。攝影機104及114包括用於擷取代表包括一或多個人的環境的數位影像之任何裝置,且可具有足夠解析度以用於如本文所述之外觀分析及/或手勢識別。例如,攝影機104及114可包括靜物攝影機(例如,組配來擷取靜止照片之攝影機)或視訊攝影機(例如,組配來擷取由多個訊框組成之移動影像的攝影機)。攝影機104及114可組配來使用可見光譜中的光或利用電磁譜中不限於紅外光譜、紫外光譜等等之其他部分的光來操作。在一實施例中,攝影機104及114可組配來偵測深度,亦即,攝影機離物體及/或該物體上之點的距離。攝影機104及114可分別併入裝置102及112中,或可為組配來與經由有線通訊或無線通訊而與裝置102及112通訊的獨立裝置。攝影機104及114之特定實例可包括有線(例如,通用串列匯
流排(USB)、乙太網路、火線等等)或無線(例如,WiFi、藍牙等等)web攝影機,如其可與電腦、視訊監視器等等相關聯;深度攝影機;行動裝置攝影機(例如,手機或智慧型電話攝影機,其例如整合至先前論述之示例裝置中);整合式膝上型電腦攝影機;整合式平板電腦攝影機(例如,iPad®、Galaxy Tab®及類似攝影機)等等。
The
裝置102及112可進一步包含麥克風106及116及揚聲器107及117。麥克風106及116包括任何裝置,其組配來感測(亦即,擷取)聲音且將感測之聲音轉化成相應音訊信號。麥克風106及116可分別整合於裝置102及112內部,或可經由有線通訊或無線通訊與該等裝置交互作用,諸如上文關於攝影機104及114之實例中所述。揚聲器107及117包括任何裝置,其組配來將音訊信號轉化成相應聲音。揚聲器107及117可分別整合於裝置102及112內部,或可經由有線通訊或無線通訊與該等裝置交互作用,諸如上文關於攝影機104及114之實例中所述。觸摸感應顯示器108及118包括任何裝置,該等裝置係組配來顯示文字、靜止影像、移動影像(例如視訊)、使用者介面、圖形等等且係組配來感測諸如輕觸、重擊等等之觸摸事件。觸摸事件可包括觸摸類型及觸摸位置。觸摸感應顯示器108及118可分別整合於裝置102及112內部,或可經由有線通訊或無線通訊與該等裝置交互作用,諸如上文關於攝影機104及114之實例中所述。在一實施例中,顯示器108及118係組配來分別顯示化身110及120。如本文所
提及,化身係定義為使用者於二維(2D)或三維(3D)中的圖形表示。化身不必類似於使用者之面容,且因此雖然化身可為逼真的表示,但該等化身還可以採取圖畫、卡通、草圖等等的形式。在系統100中,裝置102可顯示表示裝置112之使用者(例如遠程使用者)的化身110,且同樣地,裝置112可顯示表示裝置102之使用者的化身120。以此方式,使用者可看見其他使用者之表示,而不必交換涉及使用活動影像的裝置至裝置通訊的大量資訊。此外,化身可基於使用者輸入而成動畫。以此方式,使用者可與本地及/或遠程化身之顯示器交互作用,進而增強使用者體驗。所得動畫可提供相比於可能僅使用面部偵測及追蹤時更大範圍的動畫。此外,使用者可主動選擇該等動畫。
The
如本文所提及,化身音訊(亦即聲音)係定義為經轉換之使用者音訊(聲音)。例如,聲音輸入可包括使用者之嗓音,亦即,使用者語音,且相應化身音訊可包括經轉換之使用者語音。化身音訊可與使用者音訊有關。例如,化身語音可相應於音調偏移、時間延長及/或使用者語音之其他轉換。化身語音可類似於人類語音或可相應於卡通人物等等。在系統100中,裝置102可發出表示裝置112之遠程使用者的化身音訊,且類似地,裝置112可發出表示藉由裝置102擷取之音訊的化身音訊(例如,裝置102之本地使用者的語音)。以此方式,使用者可聽到可經轉換的其他使用者之嗓音的表示。
As mentioned in this article, avatar audio (ie, voice) is defined as converted user audio (voice). For example, the voice input may include the user's voice, that is, the user's voice, and the corresponding avatar audio may include the converted user's voice. The avatar audio may be related to the user audio. For example, the avatar voice may correspond to pitch shift, time extension, and/or other transformations of the user's voice. The avatar voice may be similar to a human voice or may correspond to a cartoon character and so on. In the
網路122可包括各種第二代(2G)、第三代(3G)、
第四代(4G)基於蜂巢式的資料通訊技術、Wi-Fi無線資料通訊技術等等。網路122包括至少一伺服器124,該伺服器組配來在使用此等技術時建立並保持通訊連接。例如,伺服器124可組配來支援網際網路有關的通訊協定,如對話啟動協定(SIP),其用於建立、修改及終止兩方(單播)及多方(多播)對話;互動連接性建立協定(ICE),其用於呈現允許協定建立於位元串流連接之頂端的框架;網路存取轉換器(NAT)對話穿越實用機制協定(STUN),其允許經由NAT操作之應用程式,以便發現其他NAT之存在;IP位址及埠,其經分配用於應用程式之使用者資料報協定(UDP)連接以便連接至遠程主機;使用中繼穿越NAT(TURN),其允許NAT或防火牆背後之元件經由傳輸控制協定(TCP)或UDP連接來接收資料,等等。
The
圖1B例示與本揭示案之各種實施例一致的虛擬空間系統126。系統126可使用裝置102、裝置112及伺服器124。裝置102、裝置112及伺服器124可繼續以與圖1A中所例示相似的方式來通訊,但可在虛擬空間128中發生使用者交互作用替代以裝置至裝置格式發生使用者交互作用。如本文所提及,虛擬空間可定義為實體位置之數位模擬。例如,虛擬空間128可類似於戶外位置,如同城市、道路、人行道、田野、森林、島嶼等等,或戶內位置,如同辦公室、房屋、學校、商場、商店等等。由化身表示之使用者可與現實世界中一樣看起來與虛擬空間128交互作用。虛擬空間128可存在於與網際網路耦接之一或多個伺
服器上,且可藉由第三方保持。虛擬空間之實例包括虛擬辦公室、虛擬會議室、如同Second Life®之虛擬世界、如同World of Warcraft®之大規模多人線上角色扮演遊戲(MMORPG)、如同The Sims Online®之大規模多人線上真實生活遊戲(MMORLG)等等。在系統126,虛擬空間128可含有多個對應於不同使用者之化身。替代所顯示化身,顯示器108及118可顯示包封(例如較小)型式之虛擬空間(VS)128。例如,顯示器108可顯示對應於裝置102之使用者的化身在虛擬空間128所「看見」內容的透視圖。類似地,顯示器118可顯示對應於裝置112之使用者的化身在虛擬空間128所「看見」內容的透視圖。化身可能在虛擬空間128看見的內容之實例包括但不限於虛擬結構(例如建築物)、虛擬車輛、虛擬物體、虛擬動物、其他化身等等。
FIG. 1B illustrates a
圖2例示根據本揭示案之各種實施例的示例裝置102。雖然僅描述裝置102,但是裝置112(例如遠程裝置)可包括組配來提供相同或類似功能之資源。如先前所論述,裝置102展示為包括攝影機104、麥克風106、揚聲器107及觸摸感應顯示器108。攝影機104、麥克風106及觸摸感應顯示器108可對攝影機、音訊及觸控螢幕框架模組200提供輸入,且攝影機、音訊及觸控螢幕框架模組200可提供對揚聲器107之輸出(例如音訊信號)。攝影機、音訊及觸控螢幕框架模組200可包括慣用的、專屬的、已知的及/或以後開發的音訊及視訊處理碼(或指令集),該音訊及視訊處理碼通常經明確界定且可操作來至少控制攝影機
104、麥克風106、揚聲器107及觸摸感應顯示器108。例如,攝影機、音訊及觸控螢幕框架模組200可使攝影機104、麥克風106、揚聲器107及觸摸感應顯示器108記錄影像、離物體之距離、聲音及/或觸摸,可處理影像、聲音、音訊信號及/或觸摸,可使影像及/或聲音獲複製,可對揚聲器107提供音訊信號,等等。攝影機、音訊及觸控螢幕框架模組200可取決於裝置102,且更尤其取決於裝置102中運作之作業系統(OS)而變化。示例作業系統包括iOS®、Android®、Blackberry® OS、Symbian®、Palm® OS,等等。揚聲器107可接收來自攝影機、音訊及觸控螢幕框架模組200之音訊資訊,且可組配來複製本地聲音(例如,以便提供使用者嗓音之音訊回饋,該音訊回饋經轉換或未經轉換)及遠程聲音(例如,於虛擬位置中參與電話、視訊呼叫或進行交互作用的另一方或多方之聲音(經轉換或未經轉換))。
Figure 2 illustrates an
面部偵測及追蹤模組202可組配來識別及追蹤藉由攝影機104提供的影像內的頭部、臉部及/或面部區。例如,面部偵測模組204可包括慣用的、專屬的、已知的及/或以後開發的臉部偵測碼(或指令集)、硬體及/或韌體,其通常經明確界定且可操作來接收標準格式影像(例如但不限於RGB彩色影像)且至少在某種程度上識別影像中的臉部。面部偵測及追蹤模組202亦可組配來經由一系列影像(例如處於每秒24個訊框下的視訊訊框)追蹤所偵測之臉部且基於所偵測之臉部判定頭部位置。可藉由面部偵測/追蹤模組202使用的已知追蹤系統可包括粒子濾波、平均變
動、卡爾曼濾波等等,其中每一者皆可利用邊緣分析、平方和方差分析、特徵點分析、直方圖分析、膚色分析等等。
The face detection and tracking module 202 can be configured to identify and track the head, face, and/or facial area in the image provided by the
特徵提取模組204可組配來辨識藉由臉部偵測模組202偵測之臉部中的特徵(例如,面部指標(諸如眼睛、眉毛、鼻、嘴等等)之位置及/或形狀)。在一實施例中,化身動畫可直接地基於所感測之面部動作(例如面部特徵之改變)無需進行面部表情識別。化身臉部上之相應特徵點可遵循或模仿真實人的臉部的移動,此稱為「表情仿製」或「表演驅動的面部動畫」。特徵提取模組204可包括慣用的、專屬的、已知的及/或以後開發的面部特性辨識碼(或指令集),其通常經明確界定且可操作來接收來自攝影機104之標準格式影像(例如但不限於RGB彩色影像)且至少在某程度上提取影像中的一或多個面部特性。此等已知面部特徵系統包括但不限於Colorado State University的CSU臉部識別評價系統。
The
特徵提取模組204亦可組配來辨識與所偵測之特徵相關聯的表情(例如,識別先前偵測的臉部是否高興、悲哀、微笑、皺眉頭、驚訝、興奮等等))。因此,特徵提取模組204可進一步包括慣用的、專屬的、、已知的及/或以後開發的面部表情偵測及/或標識碼(或指令集),其通常經明確界定且可操作來偵測及/或識別臉部中的表情。例如,特徵提取模組204可判定面部特微(例如眼睛、嘴、頰、牙齒等等)之大小及/或位置,且可將此等面部特徵與面部特徵資料庫比較,該面部特徵資料庫包括具有相應面部特徵類別
(例如、微笑、皺眉頭、興奮、悲哀等等)的多個樣本面部特徵。
The
音訊轉換模組206係組配來將使用者之嗓音轉換成化身嗓音,亦即,經轉換的使用者之嗓音。轉換包括調整節奏(例如延長時間)、音調(例如音調偏移)及回放率。例如,音訊轉換模組206可包括慣用的、專屬的、已知的及/或以後開發的音訊轉換碼(或指令集),其通常經明確界定且可操作來接收表示使用者之嗓音的嗓音資料,且將該等嗓音資料轉化成經轉換的嗓音資料。嗓音資料可與基於藉由麥克風106擷取且藉由攝影機、音訊及觸控螢幕框架模組200處理的聲音的音訊信號有關。此類已知嗓音轉換系統包括但不限於聲控開啟式資源音訊處理庫,其係組配來調整音訊串流或音訊檔案之節奏、音調及回放率。
The
音訊轉換模組206可包括多個預定義嗓音風格,其相應於與轉換使用者之嗓音相關聯的轉換參數。例如,轉換參數可組配來以不同音調及/或節奏保持人聽到的經轉換嗓音輸出。對人類女性或如兒童的嗓音而言,音調可偏移至高頻率;對人類男性的嗓音而言,音調可偏移至較低頻率,可向上或向下調整節奏以便增大或減小語音之速度,等等。在另一實例中,該等轉換參數可組配來產生相應於如動物的嗓音(例如貓)及/或卡通人物類嗓音的經轉換嗓音輸出。此可藉由調整使用者語音之音調、其他頻率分量及/或取樣參數來達成。
The
使用者可於起始通訊之前選擇所要音訊轉換輸
出,及/或可在通訊期間選擇所要音訊轉換。音訊轉換模組206可組配來提供響應於來自使用者之請求的樣本音訊轉換輸出。在一實施例中,音訊轉換模組206可包括允許使用者選擇音訊轉換參數來產生客製音訊轉換輸出的設施。該設施可組配來基於使用者之嗓音輸入提供樣本經轉換音訊輸出。使用者可隨後調整音訊轉換參數(例如,嘗試錯誤法)直至達成適合的轉換輸出。與對使用者之適合輸出相關聯的音訊轉換參數可隨後儲存及/或利用來進行化身通訊,如本文所述。
The user can select the desired audio conversion input before initiating communication.
And/or you can select the desired audio conversion during the communication. The
觸摸偵測模組208係組配來接收來自攝影機、音訊及觸控螢幕框架模組200之觸摸資料且基於所接收之觸摸資料識別觸摸事件。觸摸事件識別符可包括觸摸類型及/或觸摸位置。觸摸類型可包括單一輕觸、雙重輕觸、輕觸及保持、輕觸及移動、按壓及延展、重擊等等。觸摸位置可包括觸摸開始位置、觸摸結束位置及/或中間移動觸摸位置等等。觸摸位置可相應於觸摸感應顯示器108之坐標。觸摸偵測模組208可包括慣用的、專屬的、已知的及/或以後開發的觸摸偵測碼(或指令集),其通常經明確界定且可操作來接收觸摸資料且識別觸摸事件。
The
手勢偵測模組210係組配來接收來自攝影機、音訊及觸控螢幕框架模組200的深度及/或影像資料,基於所接收之深度及/或影像資料辨識相應手勢,且基於所辨識之手勢判定手勢識別符。深度相應於攝影機至物體之距離。手勢識別符與所辨識之手勢有關。手勢偵測模組210可包
括慣用的、專屬的、已知的及/或以後開發的手勢偵測碼(或指令集),其通常經明確界定且可操作來基於所接收之深度及/或影像資料識別手勢。
The
例如,手勢偵測模組210可包括預定義手勢之資料庫。該預定義手勢可包括至少一些相對普通、相對簡單的手勢,包括張開手、合緊手(亦即,握拳)、揮手、用手做圓周運動、手自右至左移動、手自左至右移動等等。因此,手勢可包括靜態非移動的手手勢、活動移動的手手勢及/或其組合。在一實施例中,手勢偵測模組210可包括訓練設施,其組配來允許使用者改變預定義手勢及/或訓練新手勢。客製手勢及/或新手勢可隨後與手勢識別符相關聯,且該手勢識別符可與動畫命令相關聯,如本文所述。例如,使用者可選擇動畫命令以與來自動畫命令之預定義清單中的手勢相關聯。
For example, the
因此,動畫命令與對使用者輸入之所要響應有關。動畫命令可與例如觸摸事件識別符及/或手勢識別符之所識別使用者輸入相關聯。以此方式,使用者可與所顯示化身交互作用及/或可設定手勢以便修改所顯示化身之動畫。 Therefore, animation commands are related to the desired response to user input. Animation commands may be associated with recognized user input such as touch event identifiers and/or gesture identifiers. In this way, the user can interact with the displayed avatar and/or can set gestures to modify the animation of the displayed avatar.
化身選擇模組212係組配來允許裝置102之使用者選擇用於在遠程裝置上顯示之化身。化身選擇模組212可包括慣用的、專屬的、已知的及/或以後開發的使用者介面構建碼(或指令集),其通常經明確界定且可操作來向使用者呈現不同化身,以便該使用者可選擇該等化身之一。在
一實施例中,一或多個化身可預定義於裝置102中。預定義化身允許所有裝置具有相同化身,且在交互作用期間僅化身之選擇(例如預定義化身之識別)需要與遠程裝置或虛擬空間通訊,從而減少需要交換的資訊之量。化身係於建立通訊之前選擇,但亦可在主動通訊過程中加以改變。因此,可能於通訊期間任何點處發送或接收化身選擇,且接收裝置可能根據所接收之化身選擇來改變所顯示化身。
The
化身控制模組214係組配來接收基於裝置102之使用者輸入的使用者輸入識別符。使用者輸入識別符可包括藉由觸摸偵測模組208基於觸摸事件資料所判定之觸摸事件識別符或藉由手勢偵測模組210所判定之手勢識別符。觸摸事件資料包括觸摸類型及/或觸摸位置。觸摸位置可相應於與觸摸感應顯示器108相關聯的坐標。觸摸位置可對映至所顯示化身上之一或多個點,例如對映至一特徵,例如鼻尖、嘴、唇、耳朵、眼睛等等。所顯示化身上之點可與化身動畫之所要響應(亦即動畫命令)有關。
The avatar control module 214 is configured to receive a user input identifier based on the user input of the
化身控制模組214係組配來基於使用者輸入識別符(亦即所識別之使用者輸入)判定動畫命令。動畫命令係組配來識別所要化身動畫。例如,所要動畫包括改變所顯示化身之臉部的顏色,改變所顯示化身之一特徵的大小(例如使鼻更大)、使眼色、眨眼、微笑,移除一特徵(例如耳朵)等等。因此,化身控制模組214係組配來接收使用者輸入識別符且基於該使用者輸入識別符判定動畫命令。 The avatar control module 214 is configured to determine the animation command based on the user input identifier (that is, the recognized user input). The animation commands are combined to identify the desired avatar animation. For example, the desired animation includes changing the color of the face of the displayed avatar, changing the size of a feature of the displayed avatar (such as making the nose larger), making winks, blinking, smiling, removing a feature (such as ears), and so on. Therefore, the avatar control module 214 is configured to receive the user input identifier and determine the animation command based on the user input identifier.
化身控制模組214係組配來基於動畫命令實行
化身動畫。在一實施例中,對例如裝置112之遠程裝置上顯示的互動動畫而言,可傳輸動畫命令且遠程化身控制模組可隨後實行該動畫。在另一實施例中,該等化身參數可經傳輸組配用於化身動畫之立即實行方案。
The avatar control module 214 is configured to execute based on animation commands
Avatar animation. In one embodiment, for an interactive animation displayed on a remote device such as
基於動畫命令的所實行之互動動畫可具有有限持續時間,在該有限持續時間之後,該化身動畫可回到基於例如如本文所述之面部偵測及追蹤的被動動畫。影響特徵之大小的所實行之互動動畫可組配來逐漸改變大小且逐漸回到初始大小。另外或替代地,影響特徵之大小的動畫可組配來具有一效果梯度。換言之,大小改變的相對量值可取決於相對於例如關鍵頂點之位置。所顯示化身上更接近關鍵頂點之點可經歷比所顯示化身上相對更遠的點更大的改變。 The interactive animation performed based on the animation command may have a finite duration, after which the avatar animation may return to a passive animation based on, for example, face detection and tracking as described herein. The implemented interactive animation that affects the size of the feature can be configured to gradually change the size and gradually return to the original size. Additionally or alternatively, the animation that affects the size of the feature can be combined to have an effect gradient. In other words, the relative magnitude of the size change may depend on the position relative to, for example, key vertices. Points on the displayed avatar that are closer to the key apex may experience greater changes than points that are relatively farther away on the displayed avatar.
因此,化身控制模組214可接收基於使用者輸入的使用者輸入識別符,可基於該使用者輸入識別符判定動畫命令且可基於該動畫命令實行動畫。基於動畫命令之互動動畫可為時間受限於一時間週期(持續時間)的及/或可包括效果梯度。動畫可在該時間週期之後回到基於面部偵測及追蹤的被動化身動畫。 Therefore, the avatar control module 214 can receive a user input identifier based on the user input, can determine an animation command based on the user input identifier, and can execute an animation based on the animation command. The interactive animation based on the animation command may be limited in time to a time period (duration) and/or may include an effect gradient. The animation can return to the passive avatar animation based on face detection and tracking after the time period.
化身控制模組214係組配來產生用於使化身成動畫之參數。如本文所提及,動畫可定義為改變影像/模型之外觀。動畫包括基於例如面部表情及/或頭部移動的被動動畫及基於使用者輸入的互動動畫。單一動畫(可包括被動動畫及互動動畫)可改變二維靜止影像之外觀,或多個動畫 可依次存在以模擬影像之運動(例如轉頭、點頭、眨眼、交談、皺眉頭、微笑、大笑、使眼色、眨眼等等)。用於三維模型之動畫的實例包括使三維線框模型變形、應用紋理對映及重新計算用於正常顯現之模型頂點。所偵測之臉部及/或所提取之面部特徵之位置的改變可轉化成使化身之特徵類似於使用者臉部之特徵的參數。在一實施例中,所偵測之臉部的一般表情可轉化成使化身顯示相同表情的一或多個參數。化身之表情亦可予以誇示以強調該表情。當化身參數可通常應用於所有預定義化身時,對所選擇之化身的認識可並非必需。然而,在一實施例中,化身參數可對所選擇之化身為特定的,且因此可在選擇另一化身之情況下加以改變。例如,人類化身可需要與動物化身、卡通化身等等不同的參數設置(例如,可改變不同化身特徵)來演示如高興、悲哀、生氣、驚訝等等之情緒。 The avatar control module 214 is configured to generate parameters for animate the avatar. As mentioned in this article, animation can be defined as changing the appearance of an image/model. Animation includes passive animation based on, for example, facial expressions and/or head movement, and interactive animation based on user input. A single animation (including passive animation and interactive animation) can change the appearance of a two-dimensional still image, or multiple animations There can be in sequence to simulate the motion of the image (such as turning head, nodding, blinking, talking, frowning, smiling, laughing, winking, blinking, etc.). Examples of animations used for 3D models include deforming 3D wireframe models, applying texture mapping, and recalculating model vertices for normal display. The changes in the position of the detected face and/or the extracted facial features can be transformed into parameters that make the features of the avatar resemble the features of the user's face. In one embodiment, the general expression of the detected face can be converted into one or more parameters that make the avatar display the same expression. The expression of the avatar can also be exaggerated to emphasize the expression. When avatar parameters can generally be applied to all predefined avatars, knowledge of the selected avatar is not necessary. However, in an embodiment, the avatar parameters may be specific to the selected avatar, and therefore may be changed in the case of selecting another avatar. For example, human avatars may require different parameter settings (for example, different avatar characteristics can be changed) from animal avatars, cartoon avatars, etc., to demonstrate emotions such as happiness, sadness, anger, surprise, and so on.
化身控制模組214可包括慣用的、專屬的、已知的及/或以後開發的圖形處理碼(或指令集),其通常經明確界定且可操作來使藉由化身選擇模組212基於面部偵測及追蹤模組202所偵測的臉部/頭部位置、特徵提取模組204所偵測的面部特徵及/或觸摸偵測模組208及/或手勢偵測模組210所判定的使用者輸入識別符而選擇的化身成動畫。對基於面部特徵之動畫方法而言,二維化身動畫可例如用影像翹曲或影像漸變來完成,而三維化身動畫可用自由形式變形(FFD)或藉由利用頭部三維模型中定義之動畫結構來完成。Oddcast為可用於二維化身動畫之軟體資源的
一實例,而FaceGen為可用於三維化身動畫之軟體資源的一實例。
The avatar control module 214 may include customary, proprietary, known, and/or later developed graphics processing codes (or instruction sets), which are usually clearly defined and operable to make the
例如,對包括拉長三維化身顯示之鼻部的互動動畫而言,可定義(例如選擇)與鼻尖有關的關鍵頂點v k 。相關聯三維運動向量d k (dx、dy、dz)及效果半徑R可定義用於關鍵頂點v k 。效果半徑R內之其他頂點可在互動動畫中改變(亦即移動),而效果半徑R外部之頂點可保持不因互動動畫而改變。互動動畫可具有相關聯之持續時間,即動畫時間T,其可延伸達多個訊框。暫時效果參數ηt可基於時間t及動畫時間T來定義,如: For example, for an interactive animation including the nose displayed by an elongated three-dimensional avatar, the key vertex v k related to the tip of the nose can be defined (for example, selected). The associated three-dimensional motion vector d k (dx, dy, dz) and the effect radius R can be defined for the key vertex v k . The other vertices within the effect radius R can be changed (that is, moved) in the interactive animation, and the vertices outside the effect radius R can remain unchanged due to the interactive animation. The interactive animation may have an associated duration, that is, the animation time T, which may extend to multiple frames. The temporary effect parameter η t can be defined based on the time t and the animation time T, such as:
效果半徑R內相對更接近v k 之頂點可比相對更遠離關鍵頂點v k 之頂點相對更大地改變。一頂點v i 之空間效果參數η i 可定義為: The vertices within the effect radius R that are relatively closer to v k can change relatively more than the vertices that are relatively farther away from the key vertex v k . A vertex of the spatial effect parameter η i v i can be defined as:
而頂點v i 在時間t的運動向量則可定義為。互動動畫化身之新坐標則為,其中相應於頂點v i 之坐標,其基於面部偵測及追蹤,亦即,被動動畫。 Vertices v i and motion vector at time t Can be defined as . The new coordinates of the interactive animation avatar are ,among them Corresponding to the coordinates of the vertex v i, which is based on the face detection and tracking, i.e., passive animation.
因此,可對包括修改互動動畫之被動動畫的所顯示化身實行動畫。互動動畫可受限於總體持續時間,且該動畫之效果的量值可在該持續時間內變化。互動動畫可組 配來僅影響化身的一部分,且該等效果可對較接近關鍵頂點之點而言較大。互動動畫完成之後,動畫可基於如本文所述的面部偵測及追蹤來繼續。 Therefore, it is possible to animate the displayed avatar including the passive animation that modifies the interactive animation. The interactive animation can be limited by the overall duration, and the magnitude of the effect of the animation can vary within the duration. Interactive animation can be grouped Matching only affects a part of the avatar, and these effects can be greater for points closer to the key apex. After the interactive animation is completed, the animation can continue based on face detection and tracking as described herein.
此外,在系統100中,化身控制模組214可接收遠程化身選擇及可用於顯示相應於遠程裝置處之使用者的化身並使其成動畫的遠程化身參數。動畫可包括被動動畫以及互動動畫。化身控制模組可使顯示模組216於顯示器108上顯示化身110。顯示模組216可包括慣用的、專屬的、已知的及/或以後開發的圖形處理碼(或指令集),其通常經明確界定且可操作來根據示例裝置至裝置實施例在顯示器108上顯示化身且使其呈動畫。例如,化身控制模組214可接收遠程化身選擇且可使該遠程化身選擇解譯以相應於預定化身。顯示模組216可隨後在顯示器108上顯示化身110。此外,化身控制模組214中所接收的遠程化身參數可獲解譯,且可將命令提供至顯示器模組216以使化身110成動畫。在一實施例中,兩個以上的使用者可參與視訊呼叫。當兩個以上的使用者在視訊呼叫中交互作用時,顯示器108可分割或分段以允許一個以上的相應於遠程使用者之化身將同時顯示或者,在系統126中,化身控制模組214可接收資訊,該資訊使顯示器模組216顯示相應於裝置102之使用者的化身在虛擬空間128中(例如,自該化身之虛擬透視角度)「看見」的內容。例如,顯示器108可顯示虛擬空間128中表示的建築物、物體、動物,其他化身,等等。
In addition, in the
在一實施例中,化身控制模組214可組配來使顯
示器模組216顯示「回饋」化身218。回饋化身218表示所選擇化身如何呈現在遠程裝置上、在虛擬位置中等等。詳言之,回饋化身218呈現為藉由使用者選擇之化身且可使用藉由化身控制模組214產生的相同參數來成動畫。以此方式,使用者可確認遠程使用者在其交互作用期間看見之內容。回饋化身218亦可用來顯示藉由裝置112之遠程使用者輸入所引起的互動動畫。因此,本地使用者可與其回饋化身(例如化身218及裝置102之使用者)交互作用,以使其相關聯化身之互動動畫顯示至裝置112上的遠程使用者。本地使用者可類似地與遠程使用者之所顯示化身(例如化身110)交互作用,從而使遠程使用者之回饋化身的互動動畫顯示於裝置112上。
In one embodiment, the avatar control module 214 can be configured to display
The display module 216 displays the "feedback"
通訊模組220係組配來傳輸及接收資訊以用於選擇化身、顯示化身、使化身成動畫、顯示虛擬位置透視圖等等。通訊模組220可包括慣用的、專屬的、已知的及/或以後開發的通訊處理碼(或指令集),其通常經明確界定且可操作來傳輸化身選擇、化身參數、動畫命令、互動化身參數及接收遠程化身選擇、遠程化身參數、遠程動畫命令及遠程互動化身參數。通訊模組220亦可傳輸及接收相應於基於化身之交互作用的音訊資訊。通訊模組220可經由網路122傳輸及接收以上資訊,如先前所述。
The communication module 220 is configured to transmit and receive information for selecting an avatar, displaying an avatar, animating the avatar, displaying a perspective view of a virtual location, and so on. The communication module 220 may include customary, exclusive, known and/or later-developed communication processing codes (or instruction sets), which are usually clearly defined and operable to transmit avatar selections, avatar parameters, animation commands, and interactions. Avatar parameters and receive remote avatar selection, remote avatar parameters, remote animation commands and remote interactive avatar parameters. The communication module 220 can also transmit and receive audio information corresponding to the interaction based on the avatar. The communication module 220 can transmit and receive the above information via the
處理器222係組配來執行與裝置102及其中所包括模組的一或多者相關聯之操作。
The processor 222 is configured to perform operations associated with the
圖3例示根據至少一實施例之示例系統實行方
案。裝置102'係組配來經由WiFi連接300來無線地通訊(例如在工作時),伺服器124'係組配來經由網際網路302協商裝置102'與112'之間的連接,且裝置112'係組配來經由另一WiFi連接304來無線地通訊(例如在家時)。在一實施例中,基於裝置至裝置化身之視訊呼叫應用程式在裝置102'中啟動。在化身選擇之後,應用程式可允許選擇至少一遠程裝置(例如裝置112')。應用程式可隨後使裝置102'起始與裝置112'之通訊。通訊可以裝置102'經由企業存取點(AP)306傳輸連接建立請求至裝置112'來起始。企業AP 306可為可用於商業設置之AP,且因此可支援比家AP 314高的資料通量及更多的並行無線客戶端。企業AP 306可接收來自裝置102'之無線信號,且可經由各種商用網路,經由閘道308進行對連接建立請求的傳輸。連接建立請求可隨後通過防火牆310,該防火牆可組配來控制流入及流出WiFi網路300之資訊。
Figure 3 illustrates an example system implementation method according to at least one embodiment
case. The device 102' is configured to communicate wirelessly via the WiFi connection 300 (for example, at work), the server 124' is configured to negotiate the connection between the devices 102' and 112' via the
裝置102'之連接建立請求可隨後藉由伺服器124'處理。伺服器124'可組配來登記IP位址、鑑別目的地位址及NAT穿越,以便連接建立請求可導向網際網路302上的正確目的地。例如,伺服器124'可自接收自裝置102的連接建立請求中的資訊來解析所欲之目的地(例如遠程裝置112'),且可將信號安排路由傳遞穿過正確NAT、埠及因此到達目的地IP位址。此等操作可僅必須在連接建立期間執行,此取決於網路組態。在一些情況下,可在視訊呼叫期間重複操作以便向NAT提供通知來保持連接有效。媒
體及信號路徑312可在已建立連接之後將視訊(例如化身選擇及/或化身參數)及音訊資訊指導攜帶至家AP 314。裝置112'可隨後接收連接建立請求且可組配來判定是否接受該請求。判定是否接受該請求可包括例如向查詢關於是否接收來自裝置102'之連接請求的裝置112'之使用者呈現視覺敘事。裝置112'之使用者接收該連接(例如,接收該視訊呼叫),即可建立該連接。攝影機104'及114'可組配來隨後開始分別擷取裝置102'及112'之各自使用者的影像,以用於是藉由各使用者選擇之化身成動畫。麥克風106'及116'可組配來隨後開始擷取來自各使用者之音訊。當在裝置102'及112'之間開始資訊交換時,顯示器108'及118'可顯示相應於裝置102'及112'之使用者的化身且使該等化身成動畫。
The connection establishment request of the device 102' can then be processed by the server 124'. The server 124' can be configured to register the IP address, identify the destination address, and NAT traversal, so that the connection establishment request can be directed to the correct destination on the
圖4例示與本揭示案之一實施例一致的示範性操作的流程圖400。該等操作可例如藉由裝置102及/或112執行。詳言之,流程圖400描繪組配來實行化身動畫(包括被動動畫及/或互動動畫)及/或音訊轉換以用於裝置之間經由網路的通訊的操作。假定面部偵測及追蹤、特徵提取及被動化身動畫如本文所述加以實行及操作。
FIG. 4 illustrates a
化身模型可在操作402選擇。化身模型可包括視訊化身選擇及音訊轉換選擇。可顯示多個視訊化身模型,使用者可自該等視訊化身模型選擇一所要化身。在一實施例中,選擇視訊化身模型可包括相關聯音訊轉換。例如,如貓的化身可與如貓的音訊轉換相關聯。在另一實施例中,音訊轉換可獨立於該視訊化身選擇來選擇。
The avatar model can be selected in
包括音訊轉換之化身模型可在啟動通訊之前選擇,但亦可在活動通訊的過程中加以改變。因此,可能於通訊期間任何點處發送或接收化身選擇及/或改變音訊轉換選擇,且接收裝置可能根據所接收之化身選擇來改變所顯示化身。 The avatar model including audio conversion can be selected before starting the communication, but it can also be changed during the active communication. Therefore, it is possible to send or receive the avatar selection and/or change the audio conversion selection at any point during the communication, and the receiving device may change the displayed avatar according to the received avatar selection.
化身通訊可在操作404啟動。例如,使用者可運行組配來使用如本文所述化身傳達音訊及視訊之應用程式。操作404可包括組配通訊及建立連接。通訊組態包括識別參與視訊呼叫之至少一遠程裝置或虛擬空間。例如,使用者可自儲存於應用程式內、儲存於與另一系統相關聯的裝置內(例如智慧型電話、手機等等中的聯絡人清單)、遠程儲存於諸如網際網路(例如,如Facebook、LinkedIn、Yahoo、Google+、MSN等等的社交媒體網站)上的的遠程使用者/裝置之清單中進行選擇。或者,使用者可選擇在如Second Life的虛擬空間中進行線上操作。 The avatar communication can be initiated at operation 404. For example, users can run applications that are configured to communicate audio and video using the avatar described herein. Operation 404 may include assembling communication and establishing a connection. The communication configuration includes identifying at least one remote device or virtual space participating in the video call. For example, the user can be stored in an application, stored in a device associated with another system (such as a contact list in a smart phone, mobile phone, etc.), or stored remotely on the Internet (such as Choose from a list of remote users/devices on social media sites such as Facebook, LinkedIn, Yahoo, Google+, MSN, etc.). Alternatively, users can choose to perform online operations in a virtual space such as Second Life.
在操作406,裝置中之攝影機可隨後開始擷取影像及/或深度,且裝置中之麥克風可開始擷取聲音。影像可為靜止影像或活動影像(例如,依次擷取的多個影像)。深度可與影像一起擷取或可獨立地擷取。深度相應於攝影機之視場中攝影機至物體(及物體上之點)的距離。可在操作408判定是否偵測到使用者輸入。使用者輸入包括藉由影像及/或深度攝影機擷取的手勢及在觸摸感應顯示器上偵測到之觸摸輸入。若偵測到使用者輸入,則可在操作410識別使用者輸入。使用者輸入識別符包括觸摸識別符或手勢識別
符。觸摸識別符可基於對觸摸感應顯示器的觸摸來判定且可包括觸摸類型及觸摸位置。手勢識別符可基於所擷取影像及/或深度資料來判定且可包括辨識手勢。
In
可在操作412識別動畫命令。動畫命令可組配來使顯示於遠程裝置上的使用者之所選擇化身成動畫,或使亦顯示於遠程使用者之裝置上的遠程使用者之回饋化身成動畫。動畫命令相應於與使用者輸入相關聯的所要響應。例如,觸摸所顯示化身的臉部(使用者輸入)可產生所顯示化身的臉部之顏色改變(藉由動畫命令識別的所要響應)。動畫命令可基於所識別之使用者輸入來識別。例如,各使用者輸入可與具有使用者輸入識別符及動畫命令之資料庫中的動畫命令有關(例如與之相關聯)。
The animation command can be recognized at
操作414包括產生化身參數。化身參數包括被動組件且可包括互動組件。若未偵測到使用者輸入,則化身參數可包括被動組件。若偵測到使用者輸入,則化身參數是否可包括互動組件取決於動畫命令並因此取決於使用者輸入。對於相應於組配來使使用者之所選擇化身成動畫的動畫命令之使用者輸入而言,動畫命令可與僅包括被動組件之化身參數一起傳輸或可在傳輸之前應用於化身參數,以便所傳輸之化身參數包括被動組件及互動組件。對於相應於組配來使顯示於遠程使用者之裝置上的遠程使用者之回饋化身成動畫的動畫命令之輸入而言,可僅傳輸動畫命令。
操作416包括轉換及編碼所擷取音訊。所擷取音
訊可轉化成音訊信號(例如使用者語音信號)。使用者語音信號可根據操作402之化身選擇的音訊轉換部分來轉換。經轉換之使用者語音信號相應於化身語音信號。化身語音信號可使用已知用於經由網路傳輸至遠程裝置及/或虛擬空間的技術來編碼。可在操作418處傳輸經轉換及編碼之音訊。操作418可進一步包括傳輸動畫命令及化身參數中之至少一者。傳輸動畫命令係組配來允許遠程裝置藉由根據動畫命令修改化身參數而使本地所顯示化身成動畫。已在傳輸之前根據動畫命令修改的經傳輸化身參數可直接用來使顯示於遠程裝置上的化身成動畫。換言之,由動畫命令表示的對化身參數之修改可在本地執行或遠程執行。
Operation 416 includes converting and encoding the retrieved audio. Captured sound
The signal can be converted into an audio signal (such as a user's voice signal). The user's voice signal can be converted according to the audio conversion part selected by the avatar in
操作420包括接收可為經轉換音訊之遠程編碼音訊。操作420進一步包括接收遠程動畫命令及遠程化身參數中之至少一者。遠程動畫命令可用來修改相應於遠程使用者之所顯示化身或本地使用者之所顯示回饋化身的化身參數。動畫命令及化身參數係組配來產生基於使用者輸入加以修改的化身動畫。在操作422處,所接收之音訊可獲解碼及播放,且在操作424處,化身可獲顯示及成動畫。
所顯示化身之動畫可基於所偵測及識別之使用者輸入,如本文所述。在裝置至裝置通訊(例如系統100)之示例中,遠程化身選擇或遠程化身參數中至少一者可接收自遠程裝置。相應於遠程使用者之化身可隨後基於所接收之遠程化身選擇來顯示,且可基於所接收之遠程化身參數而成動畫。在虛擬位置交互作用(例如系統126)之示例 中,可接收允許裝置顯示相應於裝置使用者之化身所看見的內容的資訊。 The animation of the displayed avatar can be based on the detected and recognized user input, as described herein. In an example of device-to-device communication (such as the system 100), at least one of remote avatar selection or remote avatar parameters can be received from a remote device. The avatar corresponding to the remote user can then be displayed based on the received remote avatar selection, and can be animated based on the received remote avatar parameters. Example of interaction in a virtual location (e.g. system 126) , Can receive information that allows the device to display content that corresponds to what the device user’s avatar sees.
可在操作426處判定通訊是否完成。若通訊完成,即可在操作428處結束程式流。若通訊未完成,程式流即可繼續進行至操作406,擷取影像、深度及/或音訊。
It can be determined at
雖然圖4例示根據一實施例之各種操作,但是要理解的是,並非圖4中描繪的所有操作皆為其他實施例所必需。事實上,本文完全涵蓋的是,本揭示案之其他實施例、圖4中描繪之操作及/或本文描述之其他操作均可以一方式組合,該組合方式並未明確展示於隨附圖式之任何圖式中,但仍完全與本揭示案一致。因此,針對並未確切展示於一圖式中的特徵及/或操作的請求項被視為屬於本揭示案之範疇及內容。 Although FIG. 4 illustrates various operations according to one embodiment, it should be understood that not all operations depicted in FIG. 4 are necessary for other embodiments. In fact, what this article completely covers is that other embodiments of the present disclosure, the operations depicted in FIG. 4, and/or other operations described herein can all be combined in one way, which is not explicitly shown in the accompanying drawings. In any scheme, it is still completely consistent with this disclosure. Therefore, the requirements for features and/or operations that are not exactly displayed in a drawing are deemed to belong to the scope and content of this disclosure.
如本文中任何實施例所使用,「應用程式(app)」一詞可以代碼或指令體現,該等代碼或指令可在諸如主機處理器的可規劃電路或其他可規劃電路上執行。 As used in any of the embodiments herein, the term "application (app)" can be embodied by codes or instructions, which can be executed on a programmable circuit such as a host processor or other programmable circuits.
如本文中任何實施例所使用,「模組」一詞可代表app、軟體、韌體及/或電路,其組配來執行上述操作中之任何操作。軟體可體現為套裝軟體、記錄於至少一非暫時性電腦可讀儲存媒體上之代碼、指令、指令集及/或資料。韌體可體現為硬編碼(例如非依電性)於記憶體裝置中的代碼、指令或指令集及/或資料。 As used in any of the embodiments herein, the term "module" can refer to apps, software, firmware, and/or circuits, which are configured to perform any of the above operations. The software may be embodied as packaged software, codes, instructions, instruction sets and/or data recorded on at least one non-transitory computer-readable storage medium. The firmware can be embodied as hard-coded (for example, non-electricity) codes, commands or command sets and/or data in the memory device.
如本文中任何實施例所使用,「電路」可包含例如單獨的或呈任何組合的硬連線電路;可規劃電路,諸如 包含一或多個單獨指令處理核心之電腦處理器;狀態機電路及/或儲存藉由可規劃電路執行之指令的韌體。模組可共同地或單獨地體現為形成大型系統之部分的電路,例如積體電路(IC)、系統單晶片(SoC)、桌上型電腦、膝上型電腦、平板電腦、伺服器、智慧型電話等等。 As used in any of the embodiments herein, "circuits" can include, for example, hard-wired circuits alone or in any combination; circuits can be programmed, such as A computer processor that includes one or more separate instruction processing cores; a state machine circuit and/or a firmware that stores instructions executed by a programmable circuit. Modules can be collectively or individually embodied as circuits that form part of a large system, such as integrated circuits (IC), system-on-chip (SoC), desktop computers, laptop computers, tablet computers, servers, and intelligence Type phone and so on.
如此所描述之任何操作可實行於包括一或多個儲存媒體之系統中,該等儲存媒體上儲存有單獨的或呈組合的指令,在藉由一或多個處理器執行該等指令時,該等指令執行該等方法。在此,處理器可包括例如伺服器CPU、行動裝置CPU及/或其他可規劃電路。此外,本文描述之操作意欲可跨越多個實體裝置來分散,該等實體裝置諸如處在一個以上不同實體位置處的處理結構。儲存媒體可包括任何類型的有形媒體,例如,任何類型之碟片,包括硬碟、軟碟片、光碟、光碟片-唯讀記憶體(CD-ROM)、可重寫光碟片(CD-RW)及磁光碟;半導體裝置,諸如唯讀記憶體(ROM)、隨機存取記憶體(RAM)(諸如動態及靜態RAM)、可抹除可規劃唯讀記憶體(EPROM)、電氣可抹除可規劃唯讀記憶體(EEPROM)、快閃記憶體、固態碟片(SSD)、磁性或光學卡;或者適合於儲存電子指令的任何類型之媒體。其他實施例可實行為藉由可規劃控制裝置執行之軟體模組。儲存媒體可為非暫時性的。 Any of the operations described in this way can be implemented in a system that includes one or more storage media on which separate or combined instructions are stored. When the instructions are executed by one or more processors, These instructions execute these methods. Here, the processor may include, for example, a server CPU, a mobile device CPU, and/or other programmable circuits. In addition, the operations described herein are intended to be distributed across multiple physical devices, such as processing structures at more than one different physical location. Storage media can include any type of tangible media, for example, any type of discs, including hard disks, floppy discs, optical discs, optical discs-read only memory (CD-ROM), and rewritable discs (CD-RW) ) And magneto-optical disks; semiconductor devices, such as read-only memory (ROM), random access memory (RAM) (such as dynamic and static RAM), erasable programmable read-only memory (EPROM), electrical erasable It can be designed as read-only memory (EEPROM), flash memory, solid state disk (SSD), magnetic or optical card; or any type of media suitable for storing electronic commands. Other embodiments can be implemented as a software module executed by a programmable control device. The storage medium may be non-transitory.
因此,本揭示案提供一種用於使化身交互地成動畫以替代活動影像來進行視訊通訊的方法及系統。與活動影像之發送相比,化身之使用減少要交換的資訊之量。該 系統及方法進一步組配來藉由例如音調偏移及/或使所擷取音訊信號時間延長而將使用者語音轉換成化身語音。化身之互動動畫可基於所偵測之使用者輸入,包括觸摸及手勢。互動動畫係組配來修改基於面部偵測及追蹤判定之動畫。 Therefore, the present disclosure provides a method and system for interactively animate an avatar instead of moving images for video communication. Compared with the sending of moving images, the use of avatars reduces the amount of information to be exchanged. The The system and method are further configured to convert the user's voice into an avatar voice by, for example, pitch shifting and/or prolonging the time of the captured audio signal. The interactive animation of the avatar can be based on the detected user input, including touch and gesture. Interactive animation is combined to modify the animation based on face detection and tracking judgment.
根據一態樣,提供一種系統。該系統可包括:使用者輸入裝置,其組配來擷取使用者輸入;通訊模組,其組配來傳輸及接收資訊;以及一或多個儲存媒體。此外,該一或多個儲存媒體上儲存有單獨的或呈組合的指令,在藉由一或多個處理器執行該等指令時產生以下操作,包含:選擇化身;起始通訊;偵測使用者輸入;識別使用者輸入;基於使用者輸入識別動畫命令;產生化身參數;以及傳輸動畫命令及化身參數中之至少一者。 According to one aspect, a system is provided. The system may include: a user input device configured to capture user input; a communication module configured to transmit and receive information; and one or more storage media. In addition, the one or more storage media stores individual or combined instructions, and the following operations are generated when the instructions are executed by one or more processors, including: selecting an avatar; initiating communication; detecting use User input; identifying user input; identifying animation commands based on user input; generating avatar parameters; and transmitting at least one of animation commands and avatar parameters.
另一示例系統包括前述組件且進一步包括:麥克風,其組配來擷取聲音且將所擷取之聲音轉化成相應音訊信號;以及指令,當藉由一或多個處理器執行該等指令時產生以下額外操作:擷取使用者語音且將使用者語音轉化成相應使用者語音信號;將使用者語音信號轉換成化身語音信號;以及傳輸化身語音信號。 Another example system includes the aforementioned components and further includes: a microphone configured to capture sound and convert the captured sound into a corresponding audio signal; and instructions, when the instructions are executed by one or more processors The following additional operations are generated: capturing the user's voice and converting the user's voice into a corresponding user's voice signal; converting the user's voice signal into an avatar voice signal; and transmitting the avatar voice signal.
另一示例系統包括前述組件且進一步包括:攝影機,其組配來擷取影像;以及指令,當藉由一或多個處理器執行該等指令時產生以下額外操作:擷取影像;偵測影像中的臉部;自臉部提取特徵;以及將特徵轉化成化身參數。 Another example system includes the aforementioned components and further includes: a camera configured to capture an image; and instructions, when the instructions are executed by one or more processors, the following additional operations are generated: capture an image; detect an image The face in, extract features from the face, and convert the features into avatar parameters.
另一示例系統包括前述組件且進一步包括:顯示器;以及指令,當藉由一或多個處理器執行該等指令時產生以下額外操作:顯示至少一化身;接收遠程動畫命令及遠程化身參數中之至少一者;以及基於遠程動畫命令及遠程化身參數中之至少一者使一所顯示化身成動畫。 Another example system includes the aforementioned components and further includes: a display; and instructions. When the instructions are executed by one or more processors, the following additional operations are generated: displaying at least one avatar; receiving one of remote animation commands and remote avatar parameters At least one; and animating a displayed avatar based on at least one of a remote animation command and a remote avatar parameter.
另一示例系統包括前述組件且進一步包括:揚聲器,其組配來將音訊信號轉換成聲音;以及指令,當藉由一或多個處理器執行該等指令時產生以下額外操作:接收遠程化身語音信號;以及將遠程化身語音信號轉化成化身語音。 Another example system includes the aforementioned components and further includes: a speaker configured to convert an audio signal into sound; and instructions, when the instructions are executed by one or more processors, the following additional operations are generated: receiving remote avatar voice Signal; and convert the remote avatar voice signal into avatar voice.
另一示例系統包括前述組件,且該使用者輸入裝置為組配來擷取距離之攝影機且該使用者輸入為手勢。 Another example system includes the aforementioned components, and the user input device is a camera configured to capture a distance, and the user input is a gesture.
另一示例系統包括前述組件,且該使用者輸入裝置為觸摸感應顯示器且該使用者輸入為觸摸事件。 Another example system includes the aforementioned components, and the user input device is a touch sensitive display and the user input is a touch event.
另一示例系統包括前述組件,且該轉換包含音調偏移及時間延長中之至少一者。 Another example system includes the aforementioned components, and the conversion includes at least one of pitch shift and time extension.
根據另一態樣,提供一種方法。該方法可包括選擇化身;起始通訊;偵測使用者輸入;識別使用者輸入;基於使用者輸入識別動畫命令;基於動畫命令產生化身參數;及傳輸動畫命令及化身參數中之至少一者。 According to another aspect, a method is provided. The method may include selecting an avatar; initiating communication; detecting user input; recognizing user input; recognizing animation commands based on user input; generating avatar parameters based on the animation commands; and transmitting at least one of the animation commands and avatar parameters.
另一示例方法包括前述操作且進一步包括:擷取使用者語音且將使用者語音轉化成相應使用者語音信號;將使用者語音信號轉換成化身語音信號;以及傳輸化身語音信號。 Another example method includes the foregoing operations and further includes: capturing user voice and converting the user voice into a corresponding user voice signal; converting the user voice signal into an avatar voice signal; and transmitting the avatar voice signal.
另一示例方法包括前述操作且進一步包括:擷取影像;偵測影像中的臉部;自臉部提取特徵;以及將特徵轉化成化身參數。 Another example method includes the foregoing operations and further includes: capturing an image; detecting a face in the image; extracting features from the face; and converting the features into avatar parameters.
另一示例方法包括前述操作且進一步包括:顯示至少一化身;接收遠程動畫命令及遠程化身參數中之至少一者;以及基於遠程動畫命令及遠程化身參數中之至少一者使一所顯示化身成動畫。 Another example method includes the foregoing operations and further includes: displaying at least one avatar; receiving at least one of a remote animation command and a remote avatar parameter; and making a displayed avatar into an avatar based on at least one of the remote animation command and the remote avatar parameter Animation.
另一示例方法包括前述操作且進一步包括:接收遠程化身語音信號;以及將遠程化身語音信號轉化成化身語音。 Another example method includes the foregoing operations and further includes: receiving a remote avatar voice signal; and converting the remote avatar voice signal into an avatar voice.
另一示例方法包括前述操作且該使用者輸入為手勢。 Another example method includes the aforementioned operations and the user input is a gesture.
另一示例方法包括前述操作且該使用者輸入為觸摸事件。 Another example method includes the aforementioned operations and the user input is a touch event.
另一示例方法包括前述操作且該轉換包含音調偏移及時間延長中之至少一者。 Another example method includes the foregoing operations and the conversion includes at least one of pitch shift and time extension.
根據另一態樣,提供一種系統。該系統可包括一或多個儲存媒體,該一或多個儲存媒體上儲存有單獨的或呈組合的指令,在藉由一或多個處理器執行該等指令時產生以下操作,包括選擇化身;起始通訊;偵測使用者輸入;識別使用者輸入;基於使用者輸入識別動畫命令;產生化身參數;以及傳輸動畫命令及化身參數中之至少一者。 According to another aspect, a system is provided. The system may include one or more storage media storing individual or combined instructions on the one or more storage media, and the following operations are generated when the instructions are executed by one or more processors, including selecting an avatar Initiate communication; detect user input; recognize user input; recognize animation commands based on user input; generate avatar parameters; and transmit at least one of animation commands and avatar parameters.
另一示例系統包括指令,當藉由一或多個處理器執行該等指令時產生前述操作,並且亦包括:擷取使用者 語音且將使用者語音轉化成相應使用者語音信號;將使用者語音信號轉換成化身語音信號;以及傳輸化身語音信號。 Another example system includes instructions that generate the aforementioned operations when executed by one or more processors, and also includes: fetching users Voice and convert the user's voice into a corresponding user's voice signal; convert the user's voice signal into an avatar voice signal; and transmit the avatar voice signal.
另一示例系統包括指令,當藉由一或多個處理器執行該等指令時產生前述操作,並且亦包括:擷取影像;偵測影像中的臉部;自臉部提取特徵;以及將特徵轉化成化身參數。 Another example system includes instructions that generate the foregoing operations when executed by one or more processors, and also includes: capturing an image; detecting a face in the image; extracting features from the face; and combining the features Converted into avatar parameters.
另一示例系統包括指令,當藉由一或多個處理器執行該等指令時產生前述操作,並且亦包括:顯示至少一化身;接收遠程動畫命令及遠程化身參數中之至少一者;以及基於遠程動畫命令及遠程化身參數中之至少一者使一所顯示化身成動畫。 Another example system includes instructions that generate the foregoing operations when the instructions are executed by one or more processors, and also includes: displaying at least one avatar; receiving at least one of remote animation commands and remote avatar parameters; and based on At least one of the remote animation command and the remote avatar parameter animates a displayed avatar.
另一示例系統包括指令,當藉由一或多個處理器執行該等指令時產生前述操作,並且亦包括:接收遠程化身語音信號;以及將遠程化身語音信號轉化成化身語音。 Another example system includes instructions that generate the aforementioned operations when the instructions are executed by one or more processors, and also includes: receiving a remote avatar voice signal; and converting the remote avatar voice signal into an avatar voice.
另一示例系統包括指令,當藉由一或多個處理器執行該等指令時產生前述操作,並且該使用者輸入為手勢。 Another example system includes instructions that when executed by one or more processors generate the aforementioned operations, and the user input is a gesture.
另一示例系統包括指令,當藉由一或多個處理器執行該等指令時產生前述操作,並且該使用者輸入為觸摸事件。 Another example system includes instructions that generate the aforementioned operations when the instructions are executed by one or more processors, and the user input is a touch event.
另一示例系統包括指令,當藉由一或多個處理器執行該等指令時產生前述操作,並且該轉換包含音調偏移及時間延長中之至少一者。 Another example system includes instructions that when executed by one or more processors produce the aforementioned operations, and the conversion includes at least one of pitch shift and time extension.
本文已使用之用詞及表述係用作描述之用詞且並非限制,且在使用此等用詞及表述時,不欲排除所展示 及所描述的特徵之任何等效物(或其部分),且應認識到,在申請專利範圍之範疇內,可能存在各種修改。因此,申請專利範圍意欲涵蓋所有此類等效物。 The terms and expressions used in this article are used as descriptive terms and are not limiting, and when using these terms and expressions, it is not intended to exclude the display And any equivalents (or parts thereof) of the described features, and it should be recognized that there may be various modifications within the scope of the patent application. Therefore, the scope of patent application is intended to cover all such equivalents.
100:裝置至裝置系統/系統 100: device-to-device system/system
102、112:裝置/遠程裝置 102, 112: device/remote device
104、114:攝影機 104, 114: Camera
106、116:麥克風 106, 116: Microphone
107、117:揚聲器 107, 117: Speaker
108、118:觸摸感應顯示器/顯示器 108, 118: Touch-sensitive display/display
110、120:化身 110, 120: Avatar
122:網路 122: Network
124:伺服器 124: Server
Claims (28)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
TW109121460A TW202107250A (en) | 2013-04-08 | 2013-04-08 | Communication using interactive avatars |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
TW109121460A TW202107250A (en) | 2013-04-08 | 2013-04-08 | Communication using interactive avatars |
Publications (1)
Publication Number | Publication Date |
---|---|
TW202107250A true TW202107250A (en) | 2021-02-16 |
Family
ID=75745262
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
TW109121460A TW202107250A (en) | 2013-04-08 | 2013-04-08 | Communication using interactive avatars |
Country Status (1)
Country | Link |
---|---|
TW (1) | TW202107250A (en) |
-
2013
- 2013-04-08 TW TW109121460A patent/TW202107250A/en unknown
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11303850B2 (en) | Communication using interactive avatars | |
TWI656505B (en) | System and method for avatar management and selection | |
TWI642306B (en) | System and method for avatar generation, rendering and animation | |
US9398262B2 (en) | Communication using avatar | |
US9936165B2 (en) | System and method for avatar creation and synchronization | |
TWI682669B (en) | Communication using interactive avatars | |
TWI583198B (en) | Communication using interactive avatars | |
TW202107250A (en) | Communication using interactive avatars |