TW202107250A

TW202107250A - Communication using interactive avatars

Info

Publication number: TW202107250A
Application number: TW109121460A
Authority: TW
Inventors: 童曉芬; 李文龍; 杜楊洲; 胡威; 張益明
Original assignee: 美商英特爾公司
Priority date: 2013-04-08
Filing date: 2013-04-08
Publication date: 2021-02-16

Abstract

Generally this disclosure describes a video communication system that replaces actual live images of the participating users with animated avatars. A method may include selecting an avatar; initiating communication; detecting a user input; identifying the user input; identifying an animation command based on the user input; generating avatar parameters; and transmitting at least one of the animation command and the avatar parameters.

Description

Communication technology using interactive avatars (7)

Invention field

以下揭示案係有關於視訊通訊，且更特定言之，係有關於使用互動化身的視訊通訊。 The following disclosures are related to video communications, and more specifically, to video communications using interactive avatars.

Background of the invention

行動裝置中可利用的功能性種類之增加使得使用者產生對除單純的呼叫之外經由視訊進行通訊的渴望。例如，使用者可起始「視訊呼叫」、「視訊會議」等等，其中裝置中之攝影機及麥克風捕獲使用者之音訊及視訊，該音訊及視訊實時傳輸至一或多個其他接收者，諸如其他行動裝置、桌上型電腦、視訊會議系統等等。視訊之通訊可涉及實質量之資料的傳輸(例如，取決於攝影機之技術、用來處理經擷取影像資料之特定視訊編碼解碼器，等等)。考慮到現存2G/3G無線技術之帶寬限制及新興4G無線技術之仍有限的帶寬，許多裝置使用者進行的並行視訊呼叫可超出在現存無線通訊基礎架構中可利用的帶寬，從而可負面地影響視訊呼叫之品質。 The increase in the types of functionalities available in mobile devices makes users want to communicate via video in addition to simple calls. For example, the user can initiate a "video call", "video conference", etc., where the camera and microphone in the device capture the user's audio and video, and the audio and video are transmitted to one or more other receivers in real time, such as Other mobile devices, desktop computers, video conferencing systems, etc. Video communication may involve the transmission of real-quality data (for example, depending on the technology of the camera, the specific video codec used to process the captured image data, etc.). Taking into account the bandwidth limitations of existing 2G/3G wireless technologies and the still limited bandwidth of emerging 4G wireless technologies, the parallel video calls made by many device users can exceed the bandwidth available in the existing wireless communication infrastructure, which can negatively affect The quality of the video call.

依據本發明之一實施例，係特地提出一種系統，其包含：一使用者輸入裝置，其組配來擷取一使用者輸入；一通訊模組，其組配來傳輸及接收資訊；以及一或多個儲存媒體，該儲存媒體上單獨地或組合地儲存有指令，當藉由一或多個處理器執行該等指令時導致包含以下之操作：選擇一化身；起始通訊；偵測一使用者輸入；識別該使用者輸入；基於該使用者輸入識別一動畫命令；產生化身參數；以及傳輸該動畫命令及該等化身參數中之至少一者。 According to an embodiment of the present invention, a system is specifically proposed, which includes: a user input device configured to capture a user input; a communication module configured to transmit and receive information; and a Or a plurality of storage media on which instructions are stored individually or in combination, when executed by one or more processors, the instructions include the following operations: select an avatar; initiate communication; detect one User input; identifying the user input; identifying an animation command based on the user input; generating avatar parameters; and transmitting the animation command and at least one of the avatar parameters.

100:裝置至裝置系統/系統 100: device-to-device system/system

102、112、102’:裝置/遠程裝置 102, 112, 102’: Device/Remote Device

104、114、104’/114’:攝影機 104, 114, 104’/114’: camera

106、116、106’、116’:麥克風 106, 116, 106’, 116’: microphone

107、117:揚聲器 107, 117: Speaker

108、118:觸摸感應顯示器/顯示器 108, 118: Touch-sensitive display/display

108’、118’:顯示器 108’, 118’: Display

110、120:化身 110, 120: Avatar

112’:裝置/遠程裝置 112’: Device/Remote Device

122:網路 122: Network

124、124’:伺服器 124, 124’: server

126:系統 126: System

128:虛擬空間 128: virtual space

200:攝影機、音訊及觸控螢幕框架模組 200: Camera, audio and touch screen frame module

202:面部偵測及追蹤模組/面部偵測/追蹤模組/臉部偵測模組 202: Face detection and tracking module/face detection/tracking module/face detection module

204:特徵提取模組 204: Feature Extraction Module

206:音訊轉換模組 206: Audio Conversion Module

208:觸摸偵測模組 208: Touch detection module

210:手勢偵測模組 210: Gesture Detection Module

212:化身選擇模組 212: Avatar Selection Module

214:化身控制模組 214: Avatar Control Module

216:系統 216: System

218:回饋化身 218: Giving Back Incarnation

220:通訊模組 220: Communication module

222:處理器 222: processor

300、304:WiFi連接 300, 304: WiFi connection

302:網際網路 302: Internet

306:企業AP 306: Enterprise AP

308:閘道 308: Gateway

310:防火牆 310: firewall

312:媒體及信號路徑 312: Media and signal path

314:家AP 314: Home AP

400:流程圖 400: flow chart

402~428:操作 402~428: Operation

所請求標的之各種實施例的特徵及優點將以下隨詳細說明之進行並於參閱圖式之後而變得明顯，圖示中相同數字指定相同部分，且其中： The features and advantages of the various embodiments of the requested subject matter will become apparent in the following detailed description and after referring to the drawings. The same numbers in the figures designate the same parts, and among them:

圖1A例示根據本揭示案之各種實施例的示例裝置至裝置系統； Figure 1A illustrates an example device-to-device system according to various embodiments of the present disclosure;

圖1B例示根據本揭示案之各種實施例的示例虛擬空間系統； FIG. 1B illustrates an example virtual space system according to various embodiments of the present disclosure;

圖2例示根據本揭示案之各種實施例的示例裝置； Figure 2 illustrates an example device according to various embodiments of the present disclosure;

圖3例示根據本揭示案之至少一實施例的示例系統實行方案；以及 Figure 3 illustrates an exemplary system implementation scheme according to at least one embodiment of the present disclosure; and

圖4為根據本揭示案之至少一實施例的示例操作的流程圖。 FIG. 4 is a flowchart of example operations according to at least one embodiment of the present disclosure.

雖然以下詳細說明係參考說明性實施例來進行，但是熟習此項技術者將明白該等實施例之許多替代例、修改形式及變化形式。 Although the following detailed description is made with reference to illustrative embodiments, those skilled in the art will understand many alternatives, modifications, and variations of these embodiments.

Detailed description of the preferred embodiment

通常，本揭示案描述使用互動化身來視訊通訊的系統及方法。與活動影像相對，使用化身大體上減少將要傳輸之資料量，且因此化身通訊需要較少的帶寬。互動化身係組配來藉由基於使用者輸入改變所選化身之顯示而增強使用者體驗。此外，使用者語音可獲擷取及轉換來產生化身語音。化身語音可隨後與使用者語音有關，但可遮掩使用者之身份。音訊轉換可包括例如音調偏移及/或時間延長。 Generally, this disclosure describes a system and method for video communication using interactive avatars. Compared with moving images, the use of avatars generally reduces the amount of data to be transmitted, and therefore avatar communication requires less bandwidth. The interactive avatar is configured to enhance the user experience by changing the display of the selected avatar based on user input. In addition, the user's voice can be captured and converted to generate an avatar voice. The avatar voice can then be related to the user's voice, but can obscure the user's identity. Audio conversion may include, for example, pitch shift and/or time extension.

在一實施例中，啟動耦接至攝影機、麥克風及揚聲器之裝置中的應用程式。該應用程式可組配來允許使用者選擇用於顯示於遠程裝置上、虛擬空間中等等之化身。裝置可隨後組配來起始與至少一其他裝置、虛擬空間等等的通訊。例如，通訊可經由2G、3G、4G蜂巢式連接來建立。或者或另外，通訊可經由網際網路，經由WiFi連接來建立。通訊建立之後，攝影機可組配來開始擷取影像及/或離物體的距離，且麥克風可組配來開始擷取聲音，例如使用者語音，且將使用者語音轉化成使用者語音信號。 In one embodiment, the application program in the device coupled to the camera, microphone, and speaker is activated. The application can be configured to allow the user to choose an avatar for display on a remote device, in a virtual space, etc. The device can then be configured to initiate communication with at least one other device, virtual space, etc. For example, communication can be established via 2G, 3G, 4G cellular connections. Alternatively or additionally, communication can be established via the Internet, via a WiFi connection. After the communication is established, the camera can be configured to start capturing images and/or the distance from the object, and the microphone can be configured to start capturing sounds, such as user voice, and convert the user voice into a user voice signal.

隨後可判定是否偵測到使用者輸入。使用者輸入可藉由使用者輸入裝置擷取。使用者輸入包括藉由觸摸感應顯示器所擷取之觸摸事件及藉由攝影機所擷取的手勢，該攝影機例如組配來擷取離物體之距離的深度攝影機，及/或web攝影機。因此，使用者輸入裝置包括觸摸感應顯示器及/或攝影機。若偵測到使用者輸入，則可識別使用者輸入。對於觸摸事件，使用者輸入識別符可與觸摸類型及一或多個觸摸位置有關。對於手勢(例如張開手)而言，使用者輸入識別符可與手勢識別符有關。動畫命令可隨後基於使用者輸入來識別。動畫命令相應於與使用者輸入相關聯的所要響應，例如響應於所顯示化身之外觀上的單次輕觸而改變所顯示化身外觀的顏色。 It can then be determined whether user input is detected. User input can be captured by the user input device. User input includes touch events captured by a touch-sensitive display and gestures captured by a camera, such as a depth camera configured to capture the distance from an object, and/or a web camera. Therefore, the user input device includes a touch sensitive display And/or camera. If the user input is detected, the user input can be recognized. For touch events, the user input identifier may be related to the touch type and one or more touch positions. For gestures (such as an open hand), the user input identifier may be related to the gesture identifier. Animation commands can then be identified based on user input. The animation command corresponds to the desired response associated with the user input, such as changing the color of the displayed avatar's appearance in response to a single tap on the displayed avatar's appearance.

隨後可產生化身參數。化身參數可基於面部偵測、頭部移動及/或動畫命令來產生。化身參數可因此包括基於例如面部偵測及頭部移動的被動組件，及基於動畫命令的互動組件。化身參數可用於使化身於至少一其他裝置上、於虛擬空間內等等成動畫。在一實施例中，化身參數可基於面部偵測、頭部移動及動畫命令來產生。在該實施例中，所得動畫包括基於面部偵測及頭部移動的被動動畫，其藉由基於動畫命令的互動動畫來修改。因此，化身動畫可包括基於例如面部偵測及頭部移動的被動動畫，及基於使用者輸入的互動動畫。 The avatar parameters can then be generated. Avatar parameters can be generated based on face detection, head movement, and/or animation commands. Avatar parameters can therefore include passive components based on, for example, face detection and head movement, and interactive components based on animation commands. The avatar parameters can be used to animate the avatar on at least one other device, in a virtual space, and so on. In one embodiment, the avatar parameters can be generated based on face detection, head movement, and animation commands. In this embodiment, the resulting animation includes a passive animation based on face detection and head movement, which is modified by an interactive animation based on animation commands. Therefore, the avatar animation may include passive animation based on, for example, face detection and head movement, and interactive animation based on user input.

可隨後傳輸動畫命令及化身參數中之至少一者。在一實施例中，接收遠程動畫命令及遠程化身參數中之至少一者。該遠程動畫命令可使裝置基於遠程動畫命令來判定化身參數以便使所顯示化身成動畫。遠程化身參數可使裝置基於所接收之遠程化身參數來使所顯示化身成動畫。 At least one of the animation command and the avatar parameter can be transmitted subsequently. In an embodiment, at least one of a remote animation command and a remote avatar parameter is received. The remote animation command enables the device to determine the avatar parameters based on the remote animation command so as to animate the displayed avatar. The remote avatar parameters enable the device to animate the displayed avatar based on the received remote avatar parameters.

音訊通訊可伴隨化身動畫。通訊建立之後，麥克風可組配來擷取音訊輸入(聲音)，例如使用者語音，且將所擷取的聲音轉化成相應音訊信號(例如使用者語音信號)。在一實施例中，使用者語音信號可轉換成化身語音信號，其可隨後獲編碼及傳輸。所接收之化身語音信號可隨後藉由揚聲器轉化回聲音(例如化身語音)。化身語音可因此基於使用者語音且可保存內容但可改變與所擷取語音相關聯之頻譜資料。例如，轉換包括但不限於音調偏移時間延長及/或轉化回放率。 Audio communication can be accompanied by avatar animation. After the communication was established, Mike The wind can be configured to capture audio input (sound), such as user voice, and convert the captured voice into a corresponding audio signal (such as user voice signal). In one embodiment, the user's voice signal can be converted into an avatar voice signal, which can then be encoded and transmitted. The received avatar voice signal can then be converted back to sound (such as avatar voice) by the speaker. The avatar voice can therefore be based on the user's voice and the content can be saved but the spectrum data associated with the captured voice can be changed. For example, conversion includes, but is not limited to, pitch shift time extension and/or conversion playback rate.

使用者輸入裝置(例如觸摸感應顯示器及/或攝影機)可組配來擷取使用者輸入，該等使用者輸入係組配來基於至少一其他裝置上之使用者輸入來使化身成動畫。使用者驅動之動畫(基於動畫命令)可另外為基於面部表情及/或頭部移動之動畫。動畫命令可包括但不限於化身顯示之定位改變、面部特徵扭曲、改變特徵來傳達情緒等等。動畫命令可因此修改與基於面部偵測/追蹤之動畫類似的化身動畫及/或除基於面部偵測/追蹤之動畫之外修改化身動畫。動畫命令可產生時間有限之動畫且可基於來自遠程使用者的輸入而使所得動畫例示於本地使用者之所顯示化身上。 A user input device (such as a touch-sensitive display and/or a camera) can be configured to capture user input, and the user input is configured to animate the avatar based on user input on at least one other device. User-driven animations (based on animation commands) can additionally be animations based on facial expressions and/or head movement. Animation commands may include, but are not limited to, changes in the positioning of the avatar display, distortion of facial features, changing features to convey emotions, and so on. The animation command can therefore modify the avatar animation similar to the animation based on face detection/tracking and/or modify the avatar animation in addition to the animation based on face detection/tracking. The animation command can generate a time-limited animation and can instantiate the resulting animation on the displayed avatar of the local user based on the input from the remote user.

因此，有限帶寬視訊通訊系統可使用化身來實行。音訊可加以轉換且視訊可基於所偵測之使用者輸入及所識別之動畫命令而成動畫，從而利用化身通訊來增強使用者體驗。此外，可使用化身保存匿名性，包括如本文所述之音訊轉換。 Therefore, limited bandwidth video communication systems can be implemented using avatars. The audio can be converted and the video can be animated based on the detected user input and the recognized animation command, thereby using avatar communication to enhance the user experience. In addition, avatars can be used to preserve anonymity, including audio conversion as described in this article.

圖1A例示與本揭示案之各種實施例一致的裝置至裝置系統100。系統100可通常包括經由網路122通訊之裝置102及112。裝置102至少包括攝影機104、麥克風106、揚聲器107及觸摸感應顯示器108。裝置112至少包括攝影機114、麥克風116、揚聲器117及觸摸感應顯示器118。網路122至少包括伺服器124。 FIG. 1A illustrates a device-to-device system 100 consistent with various embodiments of the present disclosure. The system 100 may generally include devices 102 and 112 that communicate via a network 122. The device 102 at least includes a camera 104, a microphone 106, a speaker 107, and a touch-sensitive display 108. The device 112 at least includes a camera 114, a microphone 116, a speaker 117, and a touch-sensitive display 118. The network 122 at least includes a server 124.

裝置102及112可包括能夠有線通訊及/或無線通訊之各種硬體平台。例如，裝置102及112可包括但不限於視訊會議系統、桌上型電腦、膝上型電腦、平板電腦、智慧型電話(例如，iPhones®、基於Android的電話、Blackberries®、基於Symbian®的電話、基於Palm®的電話等等)、蜂巢式手機等等。攝影機104及114包括用於擷取代表包括一或多個人的環境的數位影像之任何裝置，且可具有足夠解析度以用於如本文所述之外觀分析及/或手勢識別。例如，攝影機104及114可包括靜物攝影機(例如，組配來擷取靜止照片之攝影機)或視訊攝影機(例如，組配來擷取由多個訊框組成之移動影像的攝影機)。攝影機104及114可組配來使用可見光譜中的光或利用電磁譜中不限於紅外光譜、紫外光譜等等之其他部分的光來操作。在一實施例中，攝影機104及114可組配來偵測深度，亦即，攝影機離物體及/或該物體上之點的距離。攝影機104及114可分別併入裝置102及112中，或可為組配來與經由有線通訊或無線通訊而與裝置102及112通訊的獨立裝置。攝影機104及114之特定實例可包括有線(例如，通用串列匯流排(USB)、乙太網路、火線等等)或無線(例如，WiFi、藍牙等等)web攝影機，如其可與電腦、視訊監視器等等相關聯；深度攝影機；行動裝置攝影機(例如，手機或智慧型電話攝影機，其例如整合至先前論述之示例裝置中)；整合式膝上型電腦攝影機；整合式平板電腦攝影機(例如，iPad®、Galaxy Tab®及類似攝影機)等等。 The devices 102 and 112 may include various hardware platforms capable of wired communication and/or wireless communication. For example, the devices 102 and 112 may include, but are not limited to, video conferencing systems, desktop computers, laptops, tablets, smart phones (eg, iPhones®, Android-based phones, Blackberries®, Symbian®-based phones , Palm®-based phones, etc.), cellular phones, etc. The cameras 104 and 114 include any device for capturing digital images of the environment where the replacement table includes one or more people, and may have sufficient resolution for appearance analysis and/or gesture recognition as described herein. For example, the cameras 104 and 114 may include still cameras (for example, cameras assembled to capture still photos) or video cameras (for example, cameras assembled to capture moving images composed of multiple frames). The cameras 104 and 114 can be configured to use light in the visible spectrum or operate with light in other parts of the electromagnetic spectrum that are not limited to the infrared spectrum, the ultraviolet spectrum, and the like. In one embodiment, the cameras 104 and 114 can be configured to detect depth, that is, the distance between the camera and the object and/or the point on the object. The cameras 104 and 114 may be incorporated into the devices 102 and 112, respectively, or may be configured as independent devices that communicate with the devices 102 and 112 via wired communication or wireless communication. Specific examples of cameras 104 and 114 may include wired (e.g., universal serial Streaming (USB), Ethernet, FireWire, etc.) or wireless (e.g. WiFi, Bluetooth, etc.) web camera, such as it can be associated with a computer, video monitor, etc.; depth camera; mobile device camera (e.g. , Mobile phone or smart phone camera, which is integrated into the example device discussed previously); integrated laptop camera; integrated tablet camera (for example, iPad®, Galaxy Tab® and similar cameras), etc.

裝置102及112可進一步包含麥克風106及116及揚聲器107及117。麥克風106及116包括任何裝置，其組配來感測(亦即，擷取)聲音且將感測之聲音轉化成相應音訊信號。麥克風106及116可分別整合於裝置102及112內部，或可經由有線通訊或無線通訊與該等裝置交互作用，諸如上文關於攝影機104及114之實例中所述。揚聲器107及117包括任何裝置，其組配來將音訊信號轉化成相應聲音。揚聲器107及117可分別整合於裝置102及112內部，或可經由有線通訊或無線通訊與該等裝置交互作用，諸如上文關於攝影機104及114之實例中所述。觸摸感應顯示器108及118包括任何裝置，該等裝置係組配來顯示文字、靜止影像、移動影像(例如視訊)、使用者介面、圖形等等且係組配來感測諸如輕觸、重擊等等之觸摸事件。觸摸事件可包括觸摸類型及觸摸位置。觸摸感應顯示器108及118可分別整合於裝置102及112內部，或可經由有線通訊或無線通訊與該等裝置交互作用，諸如上文關於攝影機104及114之實例中所述。在一實施例中，顯示器108及118係組配來分別顯示化身110及120。如本文所提及，化身係定義為使用者於二維(2D)或三維(3D)中的圖形表示。化身不必類似於使用者之面容，且因此雖然化身可為逼真的表示，但該等化身還可以採取圖畫、卡通、草圖等等的形式。在系統100中，裝置102可顯示表示裝置112之使用者(例如遠程使用者)的化身110，且同樣地，裝置112可顯示表示裝置102之使用者的化身120。以此方式，使用者可看見其他使用者之表示，而不必交換涉及使用活動影像的裝置至裝置通訊的大量資訊。此外，化身可基於使用者輸入而成動畫。以此方式，使用者可與本地及/或遠程化身之顯示器交互作用，進而增強使用者體驗。所得動畫可提供相比於可能僅使用面部偵測及追蹤時更大範圍的動畫。此外，使用者可主動選擇該等動畫。 The devices 102 and 112 may further include microphones 106 and 116 and speakers 107 and 117. The microphones 106 and 116 include any devices that are configured to sense (ie, capture) sounds and convert the sensed sounds into corresponding audio signals. The microphones 106 and 116 can be integrated inside the devices 102 and 112, respectively, or can interact with these devices via wired communication or wireless communication, such as described in the example of the cameras 104 and 114 above. The speakers 107 and 117 include any device that is configured to convert audio signals into corresponding sounds. The speakers 107 and 117 can be integrated inside the devices 102 and 112, respectively, or can interact with these devices via wired communication or wireless communication, such as described above in the example of the cameras 104 and 114. The touch-sensitive displays 108 and 118 include any devices that are configured to display text, still images, moving images (such as video), user interfaces, graphics, etc., and are configured to sense touches, swipes, etc. And so on touch events. Touch events can include touch type and touch location. The touch-sensitive displays 108 and 118 can be integrated within the devices 102 and 112, respectively, or can interact with these devices via wired communication or wireless communication, such as described above in the example of the cameras 104 and 114. In one embodiment, the displays 108 and 118 are configured to display the avatars 110 and 120, respectively. As in this article It is mentioned that the avatar is defined as the graphical representation of the user in two-dimensional (2D) or three-dimensional (3D). The avatar does not have to resemble the face of the user, and therefore although the avatar can be a realistic representation, the avatar can also take the form of drawings, cartoons, sketches, etc. In the system 100, the device 102 can display an avatar 110 representing the user of the device 112 (for example, a remote user), and similarly, the device 112 can display an avatar 120 representing the user of the device 102. In this way, users can see other users' representations without having to exchange large amounts of information related to device-to-device communication using moving images. In addition, the avatar can be animated based on user input. In this way, the user can interact with the display of the local and/or remote avatar, thereby enhancing the user experience. The resulting animation can provide a wider range of animation than when only face detection and tracking might be used. In addition, the user can actively select these animations.

如本文所提及，化身音訊(亦即聲音)係定義為經轉換之使用者音訊(聲音)。例如，聲音輸入可包括使用者之嗓音，亦即，使用者語音，且相應化身音訊可包括經轉換之使用者語音。化身音訊可與使用者音訊有關。例如，化身語音可相應於音調偏移、時間延長及/或使用者語音之其他轉換。化身語音可類似於人類語音或可相應於卡通人物等等。在系統100中，裝置102可發出表示裝置112之遠程使用者的化身音訊，且類似地，裝置112可發出表示藉由裝置102擷取之音訊的化身音訊(例如，裝置102之本地使用者的語音)。以此方式，使用者可聽到可經轉換的其他使用者之嗓音的表示。 As mentioned in this article, avatar audio (ie, voice) is defined as converted user audio (voice). For example, the voice input may include the user's voice, that is, the user's voice, and the corresponding avatar audio may include the converted user's voice. The avatar audio may be related to the user audio. For example, the avatar voice may correspond to pitch shift, time extension, and/or other transformations of the user's voice. The avatar voice may be similar to a human voice or may correspond to a cartoon character and so on. In the system 100, the device 102 can emit an avatar audio that represents the remote user of the device 112, and similarly, the device 112 can emit an avatar audio that represents the audio captured by the device 102 (for example, the local user of the device 102 voice). In this way, the user can hear the representation of the voice of other users that can be converted.

網路122可包括各種第二代(2G)、第三代(3G)、第四代(4G)基於蜂巢式的資料通訊技術、Wi-Fi無線資料通訊技術等等。網路122包括至少一伺服器124，該伺服器組配來在使用此等技術時建立並保持通訊連接。例如，伺服器124可組配來支援網際網路有關的通訊協定，如對話啟動協定(SIP)，其用於建立、修改及終止兩方(單播)及多方(多播)對話；互動連接性建立協定(ICE)，其用於呈現允許協定建立於位元串流連接之頂端的框架；網路存取轉換器(NAT)對話穿越實用機制協定(STUN)，其允許經由NAT操作之應用程式，以便發現其他NAT之存在；IP位址及埠，其經分配用於應用程式之使用者資料報協定(UDP)連接以便連接至遠程主機；使用中繼穿越NAT(TURN)，其允許NAT或防火牆背後之元件經由傳輸控制協定(TCP)或UDP連接來接收資料，等等。 The network 122 may include various second-generation (2G), third-generation (3G), The fourth generation (4G) is based on cellular data communication technology, Wi-Fi wireless data communication technology and so on. The network 122 includes at least one server 124 configured to establish and maintain a communication connection when using these technologies. For example, the server 124 can be configured to support Internet-related communication protocols, such as the dialog initiation protocol (SIP), which is used to establish, modify, and terminate two-party (unicast) and multi-party (multicast) dialogs; interactive connections Established Protocol (ICE), which is used to present the framework that allows the protocol to be established on top of the bit stream connection; Network Access Translator (NAT), the conversation traversal practical mechanism protocol (STUN), which allows applications operating via NAT Programs to discover the existence of other NATs; IP addresses and ports, which are allocated for the application's User Datagram Protocol (UDP) connection to connect to remote hosts; use relay traversal NAT (TURN), which allows NAT Or the components behind the firewall receive data via TCP or UDP connections, and so on.

圖1B例示與本揭示案之各種實施例一致的虛擬空間系統126。系統126可使用裝置102、裝置112及伺服器124。裝置102、裝置112及伺服器124可繼續以與圖1A中所例示相似的方式來通訊，但可在虛擬空間128中發生使用者交互作用替代以裝置至裝置格式發生使用者交互作用。如本文所提及，虛擬空間可定義為實體位置之數位模擬。例如，虛擬空間128可類似於戶外位置，如同城市、道路、人行道、田野、森林、島嶼等等，或戶內位置，如同辦公室、房屋、學校、商場、商店等等。由化身表示之使用者可與現實世界中一樣看起來與虛擬空間128交互作用。虛擬空間128可存在於與網際網路耦接之一或多個伺服器上，且可藉由第三方保持。虛擬空間之實例包括虛擬辦公室、虛擬會議室、如同Second Life®之虛擬世界、如同World of Warcraft®之大規模多人線上角色扮演遊戲(MMORPG)、如同The Sims Online®之大規模多人線上真實生活遊戲(MMORLG)等等。在系統126，虛擬空間128可含有多個對應於不同使用者之化身。替代所顯示化身，顯示器108及118可顯示包封(例如較小)型式之虛擬空間(VS)128。例如，顯示器108可顯示對應於裝置102之使用者的化身在虛擬空間128所「看見」內容的透視圖。類似地，顯示器118可顯示對應於裝置112之使用者的化身在虛擬空間128所「看見」內容的透視圖。化身可能在虛擬空間128看見的內容之實例包括但不限於虛擬結構(例如建築物)、虛擬車輛、虛擬物體、虛擬動物、其他化身等等。 FIG. 1B illustrates a virtual space system 126 consistent with various embodiments of the present disclosure. The system 126 can use the device 102, the device 112, and the server 124. The device 102, the device 112, and the server 124 can continue to communicate in a manner similar to that illustrated in FIG. 1A, but user interaction can occur in the virtual space 128 instead of the device-to-device format. As mentioned herein, virtual space can be defined as a digital simulation of physical location. For example, the virtual space 128 may be similar to outdoor locations, such as cities, roads, sidewalks, fields, forests, islands, etc., or indoor locations, such as offices, houses, schools, shopping malls, shops, and so on. The user represented by the avatar can appear to interact with the virtual space 128 as in the real world. The virtual space 128 may exist in one or more servers coupled to the Internet. Server, and can be maintained by a third party. Examples of virtual spaces include virtual offices, virtual meeting rooms, virtual worlds like Second Life®, massive multiplayer online role-playing games (MMORPG) like World of Warcraft®, and massive multiplayer online reality like The Sims Online® Game of Life (MMORLG) and so on. In the system 126, the virtual space 128 may contain multiple avatars corresponding to different users. Instead of the displayed avatar, the displays 108 and 118 may display a virtual space (VS) 128 in an enclosed (eg, smaller) form. For example, the display 108 may display a perspective view of what the avatar of the user of the device 102 "sees" in the virtual space 128. Similarly, the display 118 may display a perspective view of what the avatar of the user of the device 112 "sees" in the virtual space 128. Examples of content that the avatar may see in the virtual space 128 include, but are not limited to, virtual structures (such as buildings), virtual vehicles, virtual objects, virtual animals, other avatars, and so on.

圖2例示根據本揭示案之各種實施例的示例裝置102。雖然僅描述裝置102，但是裝置112(例如遠程裝置)可包括組配來提供相同或類似功能之資源。如先前所論述，裝置102展示為包括攝影機104、麥克風106、揚聲器107及觸摸感應顯示器108。攝影機104、麥克風106及觸摸感應顯示器108可對攝影機、音訊及觸控螢幕框架模組200提供輸入，且攝影機、音訊及觸控螢幕框架模組200可提供對揚聲器107之輸出(例如音訊信號)。攝影機、音訊及觸控螢幕框架模組200可包括慣用的、專屬的、已知的及/或以後開發的音訊及視訊處理碼(或指令集)，該音訊及視訊處理碼通常經明確界定且可操作來至少控制攝影機 104、麥克風106、揚聲器107及觸摸感應顯示器108。例如，攝影機、音訊及觸控螢幕框架模組200可使攝影機104、麥克風106、揚聲器107及觸摸感應顯示器108記錄影像、離物體之距離、聲音及/或觸摸，可處理影像、聲音、音訊信號及/或觸摸，可使影像及/或聲音獲複製，可對揚聲器107提供音訊信號，等等。攝影機、音訊及觸控螢幕框架模組200可取決於裝置102，且更尤其取決於裝置102中運作之作業系統(OS)而變化。示例作業系統包括iOS®、Android®、Blackberry® OS、Symbian®、Palm® OS，等等。揚聲器107可接收來自攝影機、音訊及觸控螢幕框架模組200之音訊資訊，且可組配來複製本地聲音(例如，以便提供使用者嗓音之音訊回饋，該音訊回饋經轉換或未經轉換)及遠程聲音(例如，於虛擬位置中參與電話、視訊呼叫或進行交互作用的另一方或多方之聲音(經轉換或未經轉換))。 Figure 2 illustrates an example device 102 according to various embodiments of the present disclosure. Although only the device 102 is described, the device 112 (such as a remote device) may include resources configured to provide the same or similar functions. As previously discussed, the device 102 is shown as including a camera 104, a microphone 106, a speaker 107, and a touch-sensitive display 108. The camera 104, the microphone 106, and the touch-sensitive display 108 can provide input to the camera, audio, and touch screen frame module 200, and the camera, audio, and touch screen frame module 200 can provide output to the speaker 107 (such as audio signals) . The camera, audio, and touch screen frame modules 200 may include customary, proprietary, known and/or later developed audio and video processing codes (or command sets). The audio and video processing codes are usually clearly defined and Operable to control at least the camera 104, a microphone 106, a speaker 107, and a touch-sensitive display 108. For example, the camera, audio and touch screen frame module 200 enables the camera 104, microphone 106, speaker 107 and touch sensitive display 108 to record images, distance from objects, sound and/or touch, and can process images, sounds, and audio signals. And/or touch, the image and/or sound can be copied, the audio signal can be provided to the speaker 107, and so on. The camera, audio, and touch screen frame module 200 may vary depending on the device 102, and more particularly, depending on the operating system (OS) running in the device 102. Example operating systems include iOS®, Android®, Blackberry® OS, Symbian®, Palm® OS, etc. The speaker 107 can receive audio information from the camera, audio, and touch screen frame module 200, and can be configured to replicate local sounds (for example, to provide audio feedback of the user's voice, the audio feedback is converted or not) And remote sound (for example, the voice of another party or parties (converted or unconverted) participating in a phone call, video call, or interaction in a virtual location).

面部偵測及追蹤模組202可組配來識別及追蹤藉由攝影機104提供的影像內的頭部、臉部及/或面部區。例如，面部偵測模組204可包括慣用的、專屬的、已知的及/或以後開發的臉部偵測碼(或指令集)、硬體及/或韌體，其通常經明確界定且可操作來接收標準格式影像(例如但不限於RGB彩色影像)且至少在某種程度上識別影像中的臉部。面部偵測及追蹤模組202亦可組配來經由一系列影像(例如處於每秒24個訊框下的視訊訊框)追蹤所偵測之臉部且基於所偵測之臉部判定頭部位置。可藉由面部偵測/追蹤模組202使用的已知追蹤系統可包括粒子濾波、平均變動、卡爾曼濾波等等，其中每一者皆可利用邊緣分析、平方和方差分析、特徵點分析、直方圖分析、膚色分析等等。 The face detection and tracking module 202 can be configured to identify and track the head, face, and/or facial area in the image provided by the camera 104. For example, the face detection module 204 may include customary, proprietary, known and/or later developed face detection codes (or command sets), hardware and/or firmware, which are usually clearly defined and It is operable to receive standard format images (such as but not limited to RGB color images) and at least to some extent recognize faces in the images. The face detection and tracking module 202 can also be configured to track the detected face through a series of images (such as a video frame under 24 frames per second) and determine the head based on the detected face position. Known tracking systems that can be used by the face detection/tracking module 202 can include particle filtering, averaging Motion, Kalman filtering, etc., each of which can use edge analysis, square sum variance analysis, feature point analysis, histogram analysis, skin color analysis, and so on.

特徵提取模組204可組配來辨識藉由臉部偵測模組202偵測之臉部中的特徵(例如，面部指標(諸如眼睛、眉毛、鼻、嘴等等)之位置及/或形狀)。在一實施例中，化身動畫可直接地基於所感測之面部動作(例如面部特徵之改變)無需進行面部表情識別。化身臉部上之相應特徵點可遵循或模仿真實人的臉部的移動，此稱為「表情仿製」或「表演驅動的面部動畫」。特徵提取模組204可包括慣用的、專屬的、已知的及/或以後開發的面部特性辨識碼(或指令集)，其通常經明確界定且可操作來接收來自攝影機104之標準格式影像(例如但不限於RGB彩色影像)且至少在某程度上提取影像中的一或多個面部特性。此等已知面部特徵系統包括但不限於Colorado State University的CSU臉部識別評價系統。 The feature extraction module 204 can be configured to recognize the location and/or shape of features (for example, facial indicators (such as eyes, eyebrows, nose, mouth, etc.) in the face detected by the face detection module 202) ). In one embodiment, the avatar animation can be directly based on the sensed facial movements (such as changes in facial features) without facial expression recognition. The corresponding feature points on the face of the avatar can follow or imitate the movement of a real person's face. This is called "emoji imitation" or "performance-driven facial animation". The feature extraction module 204 may include conventional, exclusive, known and/or later developed facial feature identification codes (or instruction sets), which are usually clearly defined and operable to receive standard format images from the camera 104 ( For example, but not limited to RGB color images) and at least to some extent extract one or more facial features in the image. Such known facial feature systems include, but are not limited to, the CSU facial recognition evaluation system of Colorado State University.

特徵提取模組204亦可組配來辨識與所偵測之特徵相關聯的表情(例如，識別先前偵測的臉部是否高興、悲哀、微笑、皺眉頭、驚訝、興奮等等))。因此，特徵提取模組204可進一步包括慣用的、專屬的、、已知的及/或以後開發的面部表情偵測及/或標識碼(或指令集)，其通常經明確界定且可操作來偵測及/或識別臉部中的表情。例如，特徵提取模組204可判定面部特微(例如眼睛、嘴、頰、牙齒等等)之大小及/或位置，且可將此等面部特徵與面部特徵資料庫比較，該面部特徵資料庫包括具有相應面部特徵類別 (例如、微笑、皺眉頭、興奮、悲哀等等)的多個樣本面部特徵。 The feature extraction module 204 can also be configured to recognize expressions associated with the detected features (for example, recognize whether the previously detected face is happy, sad, smiling, frowning, surprised, excited, etc.)). Therefore, the feature extraction module 204 may further include conventional, proprietary, known and/or later developed facial expression detection and/or identification codes (or instruction sets), which are usually clearly defined and operable to Detect and/or recognize expressions in faces. For example, the feature extraction module 204 can determine the size and/or position of facial features (such as eyes, mouth, cheeks, teeth, etc.), and can compare these facial features with a facial feature database. Include categories with corresponding facial features (E.g., smile, frown, excitement, sadness, etc.) multiple sample facial features.

音訊轉換模組206係組配來將使用者之嗓音轉換成化身嗓音，亦即，經轉換的使用者之嗓音。轉換包括調整節奏(例如延長時間)、音調(例如音調偏移)及回放率。例如，音訊轉換模組206可包括慣用的、專屬的、已知的及/或以後開發的音訊轉換碼(或指令集)，其通常經明確界定且可操作來接收表示使用者之嗓音的嗓音資料，且將該等嗓音資料轉化成經轉換的嗓音資料。嗓音資料可與基於藉由麥克風106擷取且藉由攝影機、音訊及觸控螢幕框架模組200處理的聲音的音訊信號有關。此類已知嗓音轉換系統包括但不限於聲控開啟式資源音訊處理庫，其係組配來調整音訊串流或音訊檔案之節奏、音調及回放率。 The audio conversion module 206 is configured to convert the user's voice into an avatar voice, that is, the converted user's voice. Conversion includes adjusting the tempo (for example, extending the time), pitch (for example, pitch shift), and playback rate. For example, the audio conversion module 206 may include customary, proprietary, known, and/or later-developed audio conversion codes (or command sets), which are usually clearly defined and operable to receive a voice representing the user’s voice. Data, and convert the voice data into converted voice data. The voice data may be related to audio signals based on sounds captured by the microphone 106 and processed by the camera, audio, and touch screen frame module 200. Such known voice conversion systems include, but are not limited to, a voice-activated open resource audio processing library, which is configured to adjust the rhythm, pitch, and playback rate of audio streams or audio files.

音訊轉換模組206可包括多個預定義嗓音風格，其相應於與轉換使用者之嗓音相關聯的轉換參數。例如，轉換參數可組配來以不同音調及/或節奏保持人聽到的經轉換嗓音輸出。對人類女性或如兒童的嗓音而言，音調可偏移至高頻率；對人類男性的嗓音而言，音調可偏移至較低頻率，可向上或向下調整節奏以便增大或減小語音之速度，等等。在另一實例中，該等轉換參數可組配來產生相應於如動物的嗓音(例如貓)及/或卡通人物類嗓音的經轉換嗓音輸出。此可藉由調整使用者語音之音調、其他頻率分量及/或取樣參數來達成。 The audio conversion module 206 may include a plurality of predefined voice styles, which correspond to conversion parameters associated with the voice of the converted user. For example, the conversion parameters can be combined to maintain the converted voice output heard by humans in different tones and/or rhythms. For the voice of a human female or child, the pitch can be shifted to a high frequency; for a human male’s voice, the pitch can be shifted to a lower frequency, and the rhythm can be adjusted up or down to increase or decrease the voice. Speed, etc. In another example, the conversion parameters can be combined to generate a converted voice output corresponding to, for example, animal voices (for example, cats) and/or cartoon character voices. This can be achieved by adjusting the pitch, other frequency components and/or sampling parameters of the user's voice.

使用者可於起始通訊之前選擇所要音訊轉換輸出，及/或可在通訊期間選擇所要音訊轉換。音訊轉換模組206可組配來提供響應於來自使用者之請求的樣本音訊轉換輸出。在一實施例中，音訊轉換模組206可包括允許使用者選擇音訊轉換參數來產生客製音訊轉換輸出的設施。該設施可組配來基於使用者之嗓音輸入提供樣本經轉換音訊輸出。使用者可隨後調整音訊轉換參數(例如，嘗試錯誤法)直至達成適合的轉換輸出。與對使用者之適合輸出相關聯的音訊轉換參數可隨後儲存及/或利用來進行化身通訊，如本文所述。 The user can select the desired audio conversion input before initiating communication. And/or you can select the desired audio conversion during the communication. The audio conversion module 206 can be configured to provide sample audio conversion output in response to a request from the user. In one embodiment, the audio conversion module 206 may include a facility that allows the user to select audio conversion parameters to generate customized audio conversion output. The facility can be configured to provide sample converted audio output based on the user's voice input. The user can then adjust the audio conversion parameters (for example, trial and error method) until a suitable conversion output is achieved. The audio conversion parameters associated with suitable output to the user can be subsequently stored and/or utilized for avatar communication, as described herein.

觸摸偵測模組208係組配來接收來自攝影機、音訊及觸控螢幕框架模組200之觸摸資料且基於所接收之觸摸資料識別觸摸事件。觸摸事件識別符可包括觸摸類型及/或觸摸位置。觸摸類型可包括單一輕觸、雙重輕觸、輕觸及保持、輕觸及移動、按壓及延展、重擊等等。觸摸位置可包括觸摸開始位置、觸摸結束位置及/或中間移動觸摸位置等等。觸摸位置可相應於觸摸感應顯示器108之坐標。觸摸偵測模組208可包括慣用的、專屬的、已知的及/或以後開發的觸摸偵測碼(或指令集)，其通常經明確界定且可操作來接收觸摸資料且識別觸摸事件。 The touch detection module 208 is configured to receive touch data from the camera, audio, and touch screen frame module 200 and recognize touch events based on the received touch data. The touch event identifier may include touch type and/or touch location. Touch types can include single tap, double tap, tap and hold, tap and move, press and stretch, heavy tap, and so on. The touch position may include a touch start position, a touch end position, and/or an intermediate moving touch position, and so on. The touch position may correspond to the coordinates of the touch-sensitive display 108. The touch detection module 208 may include conventional, proprietary, known, and/or later-developed touch detection codes (or instruction sets), which are usually clearly defined and operable to receive touch data and recognize touch events.

手勢偵測模組210係組配來接收來自攝影機、音訊及觸控螢幕框架模組200的深度及/或影像資料，基於所接收之深度及/或影像資料辨識相應手勢，且基於所辨識之手勢判定手勢識別符。深度相應於攝影機至物體之距離。手勢識別符與所辨識之手勢有關。手勢偵測模組210可包括慣用的、專屬的、已知的及/或以後開發的手勢偵測碼(或指令集)，其通常經明確界定且可操作來基於所接收之深度及/或影像資料識別手勢。 The gesture detection module 210 is configured to receive the depth and/or image data from the camera, audio, and touch screen frame module 200, and recognize the corresponding gesture based on the received depth and/or image data, and based on the recognized Gesture determination gesture identifier. The depth corresponds to the distance from the camera to the object. The gesture identifier is related to the recognized gesture. Gesture detection module 210 can be packaged These include customary, proprietary, known, and/or later developed gesture detection codes (or instruction sets), which are usually clearly defined and operable to recognize gestures based on received depth and/or image data.

例如，手勢偵測模組210可包括預定義手勢之資料庫。該預定義手勢可包括至少一些相對普通、相對簡單的手勢，包括張開手、合緊手(亦即，握拳)、揮手、用手做圓周運動、手自右至左移動、手自左至右移動等等。因此，手勢可包括靜態非移動的手手勢、活動移動的手手勢及/或其組合。在一實施例中，手勢偵測模組210可包括訓練設施，其組配來允許使用者改變預定義手勢及/或訓練新手勢。客製手勢及/或新手勢可隨後與手勢識別符相關聯，且該手勢識別符可與動畫命令相關聯，如本文所述。例如，使用者可選擇動畫命令以與來自動畫命令之預定義清單中的手勢相關聯。 For example, the gesture detection module 210 may include a database of predefined gestures. The predefined gestures may include at least some relatively common and relatively simple gestures, including opening hands, closing hands (ie, making a fist), waving hands, making circular movements with hands, moving hands from right to left, and hands from left to left. Move right and so on. Therefore, gestures may include static non-moving hand gestures, active moving hand gestures, and/or a combination thereof. In one embodiment, the gesture detection module 210 may include training facilities configured to allow the user to change predefined gestures and/or train new gestures. The custom gesture and/or the new gesture can then be associated with a gesture identifier, and the gesture identifier can be associated with an animation command, as described herein. For example, the user can select animation commands to associate with gestures from a predefined list of animation commands.

因此，動畫命令與對使用者輸入之所要響應有關。動畫命令可與例如觸摸事件識別符及/或手勢識別符之所識別使用者輸入相關聯。以此方式，使用者可與所顯示化身交互作用及/或可設定手勢以便修改所顯示化身之動畫。 Therefore, animation commands are related to the desired response to user input. Animation commands may be associated with recognized user input such as touch event identifiers and/or gesture identifiers. In this way, the user can interact with the displayed avatar and/or can set gestures to modify the animation of the displayed avatar.

化身選擇模組212係組配來允許裝置102之使用者選擇用於在遠程裝置上顯示之化身。化身選擇模組212可包括慣用的、專屬的、已知的及/或以後開發的使用者介面構建碼(或指令集)，其通常經明確界定且可操作來向使用者呈現不同化身，以便該使用者可選擇該等化身之一。在一實施例中，一或多個化身可預定義於裝置102中。預定義化身允許所有裝置具有相同化身，且在交互作用期間僅化身之選擇(例如預定義化身之識別)需要與遠程裝置或虛擬空間通訊，從而減少需要交換的資訊之量。化身係於建立通訊之前選擇，但亦可在主動通訊過程中加以改變。因此，可能於通訊期間任何點處發送或接收化身選擇，且接收裝置可能根據所接收之化身選擇來改變所顯示化身。 The avatar selection module 212 is configured to allow the user of the device 102 to select an avatar for display on the remote device. The avatar selection module 212 may include customary, proprietary, known, and/or later developed user interface building codes (or command sets), which are usually clearly defined and operable to present different avatars to the user for the purpose of The user can select one of these avatars. in In an embodiment, one or more avatars may be predefined in the device 102. The predefined avatar allows all devices to have the same avatar, and only the selection of the avatar (such as the identification of the predefined avatar) needs to communicate with the remote device or virtual space during the interaction, thereby reducing the amount of information that needs to be exchanged. The avatar is selected before establishing communication, but it can also be changed during active communication. Therefore, the avatar selection may be sent or received at any point during the communication, and the receiving device may change the displayed avatar according to the received avatar selection.

化身控制模組214係組配來接收基於裝置102之使用者輸入的使用者輸入識別符。使用者輸入識別符可包括藉由觸摸偵測模組208基於觸摸事件資料所判定之觸摸事件識別符或藉由手勢偵測模組210所判定之手勢識別符。觸摸事件資料包括觸摸類型及/或觸摸位置。觸摸位置可相應於與觸摸感應顯示器108相關聯的坐標。觸摸位置可對映至所顯示化身上之一或多個點，例如對映至一特徵，例如鼻尖、嘴、唇、耳朵、眼睛等等。所顯示化身上之點可與化身動畫之所要響應(亦即動畫命令)有關。 The avatar control module 214 is configured to receive a user input identifier based on the user input of the device 102. The user input identifier may include a touch event identifier determined by the touch detection module 208 based on touch event data or a gesture identifier determined by the gesture detection module 210. The touch event data includes touch type and/or touch position. The touch location may correspond to the coordinates associated with the touch-sensitive display 108. The touch position can be mapped to one or more points on the displayed avatar, for example, to a feature, such as nose tip, mouth, lips, ears, eyes, and so on. The displayed point on the avatar may be related to the desired response (that is, the animation command) of the avatar animation.

化身控制模組214係組配來基於使用者輸入識別符(亦即所識別之使用者輸入)判定動畫命令。動畫命令係組配來識別所要化身動畫。例如，所要動畫包括改變所顯示化身之臉部的顏色，改變所顯示化身之一特徵的大小(例如使鼻更大)、使眼色、眨眼、微笑，移除一特徵(例如耳朵)等等。因此，化身控制模組214係組配來接收使用者輸入識別符且基於該使用者輸入識別符判定動畫命令。 The avatar control module 214 is configured to determine the animation command based on the user input identifier (that is, the recognized user input). The animation commands are combined to identify the desired avatar animation. For example, the desired animation includes changing the color of the face of the displayed avatar, changing the size of a feature of the displayed avatar (such as making the nose larger), making winks, blinking, smiling, removing a feature (such as ears), and so on. Therefore, the avatar control module 214 is configured to receive the user input identifier and determine the animation command based on the user input identifier.

化身控制模組214係組配來基於動畫命令實行化身動畫。在一實施例中，對例如裝置112之遠程裝置上顯示的互動動畫而言，可傳輸動畫命令且遠程化身控制模組可隨後實行該動畫。在另一實施例中，該等化身參數可經傳輸組配用於化身動畫之立即實行方案。 The avatar control module 214 is configured to execute based on animation commands Avatar animation. In one embodiment, for an interactive animation displayed on a remote device such as device 112, an animation command can be transmitted and the remote avatar control module can then execute the animation. In another embodiment, the avatar parameters can be transmitted and combined for the immediate implementation of the avatar animation.

基於動畫命令的所實行之互動動畫可具有有限持續時間，在該有限持續時間之後，該化身動畫可回到基於例如如本文所述之面部偵測及追蹤的被動動畫。影響特徵之大小的所實行之互動動畫可組配來逐漸改變大小且逐漸回到初始大小。另外或替代地，影響特徵之大小的動畫可組配來具有一效果梯度。換言之，大小改變的相對量值可取決於相對於例如關鍵頂點之位置。所顯示化身上更接近關鍵頂點之點可經歷比所顯示化身上相對更遠的點更大的改變。 The interactive animation performed based on the animation command may have a finite duration, after which the avatar animation may return to a passive animation based on, for example, face detection and tracking as described herein. The implemented interactive animation that affects the size of the feature can be configured to gradually change the size and gradually return to the original size. Additionally or alternatively, the animation that affects the size of the feature can be combined to have an effect gradient. In other words, the relative magnitude of the size change may depend on the position relative to, for example, key vertices. Points on the displayed avatar that are closer to the key apex may experience greater changes than points that are relatively farther away on the displayed avatar.

因此，化身控制模組214可接收基於使用者輸入的使用者輸入識別符，可基於該使用者輸入識別符判定動畫命令且可基於該動畫命令實行動畫。基於動畫命令之互動動畫可為時間受限於一時間週期(持續時間)的及/或可包括效果梯度。動畫可在該時間週期之後回到基於面部偵測及追蹤的被動化身動畫。 Therefore, the avatar control module 214 can receive a user input identifier based on the user input, can determine an animation command based on the user input identifier, and can execute an animation based on the animation command. The interactive animation based on the animation command may be limited in time to a time period (duration) and/or may include an effect gradient. The animation can return to the passive avatar animation based on face detection and tracking after the time period.

化身控制模組214係組配來產生用於使化身成動畫之參數。如本文所提及，動畫可定義為改變影像/模型之外觀。動畫包括基於例如面部表情及/或頭部移動的被動動畫及基於使用者輸入的互動動畫。單一動畫(可包括被動動畫及互動動畫)可改變二維靜止影像之外觀，或多個動畫可依次存在以模擬影像之運動(例如轉頭、點頭、眨眼、交談、皺眉頭、微笑、大笑、使眼色、眨眼等等)。用於三維模型之動畫的實例包括使三維線框模型變形、應用紋理對映及重新計算用於正常顯現之模型頂點。所偵測之臉部及/或所提取之面部特徵之位置的改變可轉化成使化身之特徵類似於使用者臉部之特徵的參數。在一實施例中，所偵測之臉部的一般表情可轉化成使化身顯示相同表情的一或多個參數。化身之表情亦可予以誇示以強調該表情。當化身參數可通常應用於所有預定義化身時，對所選擇之化身的認識可並非必需。然而，在一實施例中，化身參數可對所選擇之化身為特定的，且因此可在選擇另一化身之情況下加以改變。例如，人類化身可需要與動物化身、卡通化身等等不同的參數設置(例如，可改變不同化身特徵)來演示如高興、悲哀、生氣、驚訝等等之情緒。 The avatar control module 214 is configured to generate parameters for animate the avatar. As mentioned in this article, animation can be defined as changing the appearance of an image/model. Animation includes passive animation based on, for example, facial expressions and/or head movement, and interactive animation based on user input. A single animation (including passive animation and interactive animation) can change the appearance of a two-dimensional still image, or multiple animations There can be in sequence to simulate the motion of the image (such as turning head, nodding, blinking, talking, frowning, smiling, laughing, winking, blinking, etc.). Examples of animations used for 3D models include deforming 3D wireframe models, applying texture mapping, and recalculating model vertices for normal display. The changes in the position of the detected face and/or the extracted facial features can be transformed into parameters that make the features of the avatar resemble the features of the user's face. In one embodiment, the general expression of the detected face can be converted into one or more parameters that make the avatar display the same expression. The expression of the avatar can also be exaggerated to emphasize the expression. When avatar parameters can generally be applied to all predefined avatars, knowledge of the selected avatar is not necessary. However, in an embodiment, the avatar parameters may be specific to the selected avatar, and therefore may be changed in the case of selecting another avatar. For example, human avatars may require different parameter settings (for example, different avatar characteristics can be changed) from animal avatars, cartoon avatars, etc., to demonstrate emotions such as happiness, sadness, anger, surprise, and so on.

化身控制模組214可包括慣用的、專屬的、已知的及/或以後開發的圖形處理碼(或指令集)，其通常經明確界定且可操作來使藉由化身選擇模組212基於面部偵測及追蹤模組202所偵測的臉部/頭部位置、特徵提取模組204所偵測的面部特徵及/或觸摸偵測模組208及/或手勢偵測模組210所判定的使用者輸入識別符而選擇的化身成動畫。對基於面部特徵之動畫方法而言，二維化身動畫可例如用影像翹曲或影像漸變來完成，而三維化身動畫可用自由形式變形(FFD)或藉由利用頭部三維模型中定義之動畫結構來完成。Oddcast為可用於二維化身動畫之軟體資源的一實例，而FaceGen為可用於三維化身動畫之軟體資源的一實例。 The avatar control module 214 may include customary, proprietary, known, and/or later developed graphics processing codes (or instruction sets), which are usually clearly defined and operable to make the avatar selection module 212 based on the face Face/head position detected by the detection and tracking module 202, facial features detected by the feature extraction module 204, and/or determined by the touch detection module 208 and/or gesture detection module 210 The avatar selected by the user inputting the identifier becomes an animation. For the animation method based on facial features, the two-dimensional avatar animation can be completed by image warping or image gradation, and the three-dimensional avatar animation can be free-form deformation (FFD) or by using the animation structure defined in the head 3D model To be done. Oddcast is a software resource that can be used for 2D avatar animation An example, and FaceGen is an example of software resources that can be used for 3D avatar animation.

例如，對包括拉長三維化身顯示之鼻部的互動動畫而言，可定義(例如選擇)與鼻尖有關的關鍵頂點v _k。相關聯三維運動向量d _k(dx、dy、dz)及效果半徑R可定義用於關鍵頂點v _k。效果半徑R內之其他頂點可在互動動畫中改變(亦即移動)，而效果半徑R外部之頂點可保持不因互動動畫而改變。互動動畫可具有相關聯之持續時間，即動畫時間T，其可延伸達多個訊框。暫時效果參數η_t可基於時間t及動畫時間T來定義，如： For example, for an interactive animation including the nose displayed by an elongated three-dimensional avatar, the key vertex v _k related to the tip of the nose can be defined (for example, selected). The associated three-dimensional motion vector d _k (dx, dy, dz) and the effect radius R can be defined for the key vertex v _k . The other vertices within the effect radius R can be changed (that is, moved) in the interactive animation, and the vertices outside the effect radius R can remain unchanged due to the interactive animation. The interactive animation may have an associated duration, that is, the animation time T, which may extend to multiple frames. The temporary effect parameter η _t can be defined based on the time t and the animation time T, such as:

效果半徑R內相對更接近v _k之頂點可比相對更遠離關鍵頂點v _k之頂點相對更大地改變。一頂點v _i之空間效果參數η_i可定義為： The vertices within the effect radius R that are relatively closer to v _k can change relatively more than the vertices that are relatively farther away from the key vertex v _k . A vertex of the spatial effect parameter η _i v _i can be defined as:

而頂點v _i在時間t的運動向量

則可定義為

。互動動畫化身之新坐標則為

，其中

相應於頂點v _i之坐標，其基於面部偵測及追蹤，亦即，被動動畫。 Vertices v _i and motion vector at time t

Can be defined as

. The new coordinates of the interactive animation avatar are

,among them

Corresponding to the coordinates of the vertex v _i, which is based on the face detection and tracking, i.e., passive animation.

因此，可對包括修改互動動畫之被動動畫的所顯示化身實行動畫。互動動畫可受限於總體持續時間，且該動畫之效果的量值可在該持續時間內變化。互動動畫可組配來僅影響化身的一部分，且該等效果可對較接近關鍵頂點之點而言較大。互動動畫完成之後，動畫可基於如本文所述的面部偵測及追蹤來繼續。 Therefore, it is possible to animate the displayed avatar including the passive animation that modifies the interactive animation. The interactive animation can be limited by the overall duration, and the magnitude of the effect of the animation can vary within the duration. Interactive animation can be grouped Matching only affects a part of the avatar, and these effects can be greater for points closer to the key apex. After the interactive animation is completed, the animation can continue based on face detection and tracking as described herein.

此外，在系統100中，化身控制模組214可接收遠程化身選擇及可用於顯示相應於遠程裝置處之使用者的化身並使其成動畫的遠程化身參數。動畫可包括被動動畫以及互動動畫。化身控制模組可使顯示模組216於顯示器108上顯示化身110。顯示模組216可包括慣用的、專屬的、已知的及/或以後開發的圖形處理碼(或指令集)，其通常經明確界定且可操作來根據示例裝置至裝置實施例在顯示器108上顯示化身且使其呈動畫。例如，化身控制模組214可接收遠程化身選擇且可使該遠程化身選擇解譯以相應於預定化身。顯示模組216可隨後在顯示器108上顯示化身110。此外，化身控制模組214中所接收的遠程化身參數可獲解譯，且可將命令提供至顯示器模組216以使化身110成動畫。在一實施例中，兩個以上的使用者可參與視訊呼叫。當兩個以上的使用者在視訊呼叫中交互作用時，顯示器108可分割或分段以允許一個以上的相應於遠程使用者之化身將同時顯示或者，在系統126中，化身控制模組214可接收資訊，該資訊使顯示器模組216顯示相應於裝置102之使用者的化身在虛擬空間128中(例如，自該化身之虛擬透視角度)「看見」的內容。例如，顯示器108可顯示虛擬空間128中表示的建築物、物體、動物，其他化身，等等。 In addition, in the system 100, the avatar control module 214 can receive remote avatar selection and remote avatar parameters that can be used to display and animate the avatar corresponding to the user at the remote device. Animation may include passive animation and interactive animation. The avatar control module can enable the display module 216 to display the avatar 110 on the display 108. The display module 216 may include customary, proprietary, known, and/or later developed graphics processing codes (or instruction sets), which are generally well-defined and operable to display on the display 108 according to example device-to-device embodiments Show the avatar and animate it. For example, the avatar control module 214 can receive a remote avatar selection and can enable the remote avatar selection to be interpreted to correspond to a predetermined avatar. The display module 216 can then display the avatar 110 on the display 108. In addition, the remote avatar parameters received in the avatar control module 214 can be interpreted, and commands can be provided to the display module 216 to animate the avatar 110. In one embodiment, more than two users can participate in a video call. When two or more users are interacting in a video call, the display 108 can be divided or segmented to allow more than one avatar corresponding to the remote user to be displayed at the same time. Or, in the system 126, the avatar control module 214 can Receiving information, the information causes the display module 216 to display what the avatar of the user corresponding to the device 102 "sees" in the virtual space 128 (for example, from the virtual perspective of the avatar). For example, the display 108 may display buildings, objects, animals, other avatars, etc., represented in the virtual space 128.

在一實施例中，化身控制模組214可組配來使顯示器模組216顯示「回饋」化身218。回饋化身218表示所選擇化身如何呈現在遠程裝置上、在虛擬位置中等等。詳言之，回饋化身218呈現為藉由使用者選擇之化身且可使用藉由化身控制模組214產生的相同參數來成動畫。以此方式，使用者可確認遠程使用者在其交互作用期間看見之內容。回饋化身218亦可用來顯示藉由裝置112之遠程使用者輸入所引起的互動動畫。因此，本地使用者可與其回饋化身(例如化身218及裝置102之使用者)交互作用，以使其相關聯化身之互動動畫顯示至裝置112上的遠程使用者。本地使用者可類似地與遠程使用者之所顯示化身(例如化身110)交互作用，從而使遠程使用者之回饋化身的互動動畫顯示於裝置112上。 In one embodiment, the avatar control module 214 can be configured to display The display module 216 displays the "feedback" avatar 218. The feedback avatar 218 indicates how the selected avatar appears on the remote device, in a virtual location, and so on. In detail, the feedback avatar 218 is presented as an avatar selected by the user and can be animated using the same parameters generated by the avatar control module 214. In this way, the user can confirm what the remote user sees during his interaction. The feedback avatar 218 can also be used to display interactive animations caused by remote user input of the device 112. Therefore, the local user can interact with its feedback avatar (such as the avatar 218 and the user of the device 102) so that the interactive animation of the associated avatar is displayed to the remote user on the device 112. The local user can similarly interact with the displayed avatar (such as the avatar 110) of the remote user, so that the interactive animation of the remote user's feedback avatar is displayed on the device 112.

通訊模組220係組配來傳輸及接收資訊以用於選擇化身、顯示化身、使化身成動畫、顯示虛擬位置透視圖等等。通訊模組220可包括慣用的、專屬的、已知的及/或以後開發的通訊處理碼(或指令集)，其通常經明確界定且可操作來傳輸化身選擇、化身參數、動畫命令、互動化身參數及接收遠程化身選擇、遠程化身參數、遠程動畫命令及遠程互動化身參數。通訊模組220亦可傳輸及接收相應於基於化身之交互作用的音訊資訊。通訊模組220可經由網路122傳輸及接收以上資訊，如先前所述。 The communication module 220 is configured to transmit and receive information for selecting an avatar, displaying an avatar, animating the avatar, displaying a perspective view of a virtual location, and so on. The communication module 220 may include customary, exclusive, known and/or later-developed communication processing codes (or instruction sets), which are usually clearly defined and operable to transmit avatar selections, avatar parameters, animation commands, and interactions. Avatar parameters and receive remote avatar selection, remote avatar parameters, remote animation commands and remote interactive avatar parameters. The communication module 220 can also transmit and receive audio information corresponding to the interaction based on the avatar. The communication module 220 can transmit and receive the above information via the network 122, as described previously.

處理器222係組配來執行與裝置102及其中所包括模組的一或多者相關聯之操作。 The processor 222 is configured to perform operations associated with the device 102 and one or more of the modules included therein.

圖3例示根據至少一實施例之示例系統實行方案。裝置102'係組配來經由WiFi連接300來無線地通訊(例如在工作時)，伺服器124'係組配來經由網際網路302協商裝置102'與112'之間的連接，且裝置112'係組配來經由另一WiFi連接304來無線地通訊(例如在家時)。在一實施例中，基於裝置至裝置化身之視訊呼叫應用程式在裝置102'中啟動。在化身選擇之後，應用程式可允許選擇至少一遠程裝置(例如裝置112')。應用程式可隨後使裝置102'起始與裝置112'之通訊。通訊可以裝置102'經由企業存取點(AP)306傳輸連接建立請求至裝置112'來起始。企業AP 306可為可用於商業設置之AP，且因此可支援比家AP 314高的資料通量及更多的並行無線客戶端。企業AP 306可接收來自裝置102'之無線信號，且可經由各種商用網路，經由閘道308進行對連接建立請求的傳輸。連接建立請求可隨後通過防火牆310，該防火牆可組配來控制流入及流出WiFi網路300之資訊。 Figure 3 illustrates an example system implementation method according to at least one embodiment case. The device 102' is configured to communicate wirelessly via the WiFi connection 300 (for example, at work), the server 124' is configured to negotiate the connection between the devices 102' and 112' via the Internet 302, and the device 112 It is configured to communicate wirelessly via another WiFi connection 304 (for example, at home). In one embodiment, a video calling application based on a device-to-device avatar is activated in the device 102'. After the avatar is selected, the application may allow the selection of at least one remote device (for example, the device 112'). The application can then cause the device 102' to initiate communication with the device 112'. The communication can be initiated by the device 102' transmitting a connection establishment request to the device 112' via an enterprise access point (AP) 306. The enterprise AP 306 can be an AP that can be used in commercial settings, and therefore can support higher data throughput and more parallel wireless clients than the home AP 314. The enterprise AP 306 can receive the wireless signal from the device 102', and can transmit the connection establishment request via the gateway 308 via various commercial networks. The connection establishment request can then pass through the firewall 310, which can be configured to control the flow of information into and out of the WiFi network 300.

裝置102'之連接建立請求可隨後藉由伺服器124'處理。伺服器124'可組配來登記IP位址、鑑別目的地位址及NAT穿越，以便連接建立請求可導向網際網路302上的正確目的地。例如，伺服器124'可自接收自裝置102的連接建立請求中的資訊來解析所欲之目的地(例如遠程裝置112')，且可將信號安排路由傳遞穿過正確NAT、埠及因此到達目的地IP位址。此等操作可僅必須在連接建立期間執行，此取決於網路組態。在一些情況下，可在視訊呼叫期間重複操作以便向NAT提供通知來保持連接有效。媒體及信號路徑312可在已建立連接之後將視訊(例如化身選擇及/或化身參數)及音訊資訊指導攜帶至家AP 314。裝置112'可隨後接收連接建立請求且可組配來判定是否接受該請求。判定是否接受該請求可包括例如向查詢關於是否接收來自裝置102'之連接請求的裝置112'之使用者呈現視覺敘事。裝置112'之使用者接收該連接(例如，接收該視訊呼叫)，即可建立該連接。攝影機104'及114'可組配來隨後開始分別擷取裝置102'及112'之各自使用者的影像，以用於是藉由各使用者選擇之化身成動畫。麥克風106'及116'可組配來隨後開始擷取來自各使用者之音訊。當在裝置102'及112'之間開始資訊交換時，顯示器108'及118'可顯示相應於裝置102'及112'之使用者的化身且使該等化身成動畫。 The connection establishment request of the device 102' can then be processed by the server 124'. The server 124' can be configured to register the IP address, identify the destination address, and NAT traversal, so that the connection establishment request can be directed to the correct destination on the Internet 302. For example, the server 124' can resolve the desired destination (such as the remote device 112') from the information in the connection establishment request received from the device 102, and can route the signal through the correct NAT, port, and so on Destination IP address. These operations may only have to be performed during the connection establishment period, depending on the network configuration. In some cases, the operation can be repeated during the video call to provide notification to the NAT to keep the connection alive. Media The body and signal path 312 can carry video (such as avatar selection and/or avatar parameters) and audio information guidance to the home AP 314 after the connection has been established. The device 112' can then receive a connection establishment request and can be configured to determine whether to accept the request. Determining whether to accept the request may include, for example, presenting a visual narrative to the user of the device 112' who inquires about whether to receive the connection request from the device 102'. When the user of the device 112' receives the connection (for example, receives the video call), the connection can be established. The cameras 104' and 114' can be configured to then start to capture images of the respective users of the devices 102' and 112', respectively, for the purpose of forming an animation by the avatar selected by each user. The microphones 106' and 116' can be configured to then start capturing audio from each user. When the information exchange between the devices 102' and 112' starts, the displays 108' and 118' can display the avatars of the users corresponding to the devices 102' and 112' and animate the avatars.

圖4例示與本揭示案之一實施例一致的示範性操作的流程圖400。該等操作可例如藉由裝置102及/或112執行。詳言之，流程圖400描繪組配來實行化身動畫(包括被動動畫及/或互動動畫)及/或音訊轉換以用於裝置之間經由網路的通訊的操作。假定面部偵測及追蹤、特徵提取及被動化身動畫如本文所述加以實行及操作。 FIG. 4 illustrates a flowchart 400 of exemplary operations consistent with an embodiment of the present disclosure. These operations may be performed by the device 102 and/or 112, for example. In detail, the flowchart 400 depicts the operation of configuring to implement avatar animation (including passive animation and/or interactive animation) and/or audio conversion for communication between devices via a network. It is assumed that face detection and tracking, feature extraction, and passive avatar animation are implemented and operated as described herein.

化身模型可在操作402選擇。化身模型可包括視訊化身選擇及音訊轉換選擇。可顯示多個視訊化身模型，使用者可自該等視訊化身模型選擇一所要化身。在一實施例中，選擇視訊化身模型可包括相關聯音訊轉換。例如，如貓的化身可與如貓的音訊轉換相關聯。在另一實施例中，音訊轉換可獨立於該視訊化身選擇來選擇。 The avatar model can be selected in operation 402. The avatar model may include video avatar selection and audio conversion selection. Multiple video avatar models can be displayed, and the user can select a desired avatar from the video avatar models. In one embodiment, selecting a video avatar model may include an associated audio conversion. For example, an avatar like a cat can be associated with an audio conversion like a cat. In another embodiment, the audio conversion can be selected independently of the video avatar selection.

包括音訊轉換之化身模型可在啟動通訊之前選擇，但亦可在活動通訊的過程中加以改變。因此，可能於通訊期間任何點處發送或接收化身選擇及/或改變音訊轉換選擇，且接收裝置可能根據所接收之化身選擇來改變所顯示化身。 The avatar model including audio conversion can be selected before starting the communication, but it can also be changed during the active communication. Therefore, it is possible to send or receive the avatar selection and/or change the audio conversion selection at any point during the communication, and the receiving device may change the displayed avatar according to the received avatar selection.

化身通訊可在操作404啟動。例如，使用者可運行組配來使用如本文所述化身傳達音訊及視訊之應用程式。操作404可包括組配通訊及建立連接。通訊組態包括識別參與視訊呼叫之至少一遠程裝置或虛擬空間。例如，使用者可自儲存於應用程式內、儲存於與另一系統相關聯的裝置內(例如智慧型電話、手機等等中的聯絡人清單)、遠程儲存於諸如網際網路(例如，如Facebook、LinkedIn、Yahoo、Google+、MSN等等的社交媒體網站)上的的遠程使用者/裝置之清單中進行選擇。或者，使用者可選擇在如Second Life的虛擬空間中進行線上操作。 The avatar communication can be initiated at operation 404. For example, users can run applications that are configured to communicate audio and video using the avatar described herein. Operation 404 may include assembling communication and establishing a connection. The communication configuration includes identifying at least one remote device or virtual space participating in the video call. For example, the user can be stored in an application, stored in a device associated with another system (such as a contact list in a smart phone, mobile phone, etc.), or stored remotely on the Internet (such as Choose from a list of remote users/devices on social media sites such as Facebook, LinkedIn, Yahoo, Google+, MSN, etc.). Alternatively, users can choose to perform online operations in a virtual space such as Second Life.

在操作406，裝置中之攝影機可隨後開始擷取影像及/或深度，且裝置中之麥克風可開始擷取聲音。影像可為靜止影像或活動影像(例如，依次擷取的多個影像)。深度可與影像一起擷取或可獨立地擷取。深度相應於攝影機之視場中攝影機至物體(及物體上之點)的距離。可在操作408判定是否偵測到使用者輸入。使用者輸入包括藉由影像及/或深度攝影機擷取的手勢及在觸摸感應顯示器上偵測到之觸摸輸入。若偵測到使用者輸入，則可在操作410識別使用者輸入。使用者輸入識別符包括觸摸識別符或手勢識別符。觸摸識別符可基於對觸摸感應顯示器的觸摸來判定且可包括觸摸類型及觸摸位置。手勢識別符可基於所擷取影像及/或深度資料來判定且可包括辨識手勢。 In operation 406, the camera in the device can then start capturing images and/or depth, and the microphone in the device can start capturing sound. The image can be a still image or a moving image (for example, multiple images captured in sequence). The depth can be captured with the image or independently. The depth corresponds to the distance from the camera to the object (and the point on the object) in the camera's field of view. It can be determined in operation 408 whether a user input is detected. User input includes gestures captured by an image and/or depth camera and touch input detected on a touch-sensitive display. If the user input is detected, the user input can be recognized in operation 410. User input identifier including touch identifier or gesture recognition symbol. The touch identifier may be determined based on the touch to the touch-sensitive display and may include the touch type and the touch position. The gesture identifier can be determined based on the captured image and/or depth data and can include recognizing gestures.

可在操作412識別動畫命令。動畫命令可組配來使顯示於遠程裝置上的使用者之所選擇化身成動畫，或使亦顯示於遠程使用者之裝置上的遠程使用者之回饋化身成動畫。動畫命令相應於與使用者輸入相關聯的所要響應。例如，觸摸所顯示化身的臉部(使用者輸入)可產生所顯示化身的臉部之顏色改變(藉由動畫命令識別的所要響應)。動畫命令可基於所識別之使用者輸入來識別。例如，各使用者輸入可與具有使用者輸入識別符及動畫命令之資料庫中的動畫命令有關(例如與之相關聯)。 The animation command can be recognized at operation 412. The animation command can be configured to animate the selected avatar of the user displayed on the remote device, or animate the feedback avatar of the remote user also displayed on the remote user's device. The animation command corresponds to the desired response associated with the user input. For example, touching the face of the displayed avatar (user input) can produce a change in the color of the face of the displayed avatar (a desired response recognized by an animation command). Animation commands can be recognized based on recognized user input. For example, each user input may be related to (e.g., associated with) animation commands in a database with user input identifiers and animation commands.

操作414包括產生化身參數。化身參數包括被動組件且可包括互動組件。若未偵測到使用者輸入，則化身參數可包括被動組件。若偵測到使用者輸入，則化身參數是否可包括互動組件取決於動畫命令並因此取決於使用者輸入。對於相應於組配來使使用者之所選擇化身成動畫的動畫命令之使用者輸入而言，動畫命令可與僅包括被動組件之化身參數一起傳輸或可在傳輸之前應用於化身參數，以便所傳輸之化身參數包括被動組件及互動組件。對於相應於組配來使顯示於遠程使用者之裝置上的遠程使用者之回饋化身成動畫的動畫命令之輸入而言，可僅傳輸動畫命令。 Operation 414 includes generating avatar parameters. The avatar parameters include passive components and may include interactive components. If no user input is detected, the avatar parameters can include passive components. If user input is detected, whether the avatar parameters can include interactive components depends on the animation command and therefore on the user input. For the user input corresponding to the animation command assembled to animate the selected avatar of the user, the animation command can be transmitted together with the avatar parameters including only passive components or can be applied to the avatar parameters before transmission, so that all The transmitted avatar parameters include passive components and interactive components. For the input corresponding to the animation command configured to animate the remote user's feedback displayed on the remote user's device, only the animation command may be transmitted.

操作416包括轉換及編碼所擷取音訊。所擷取音訊可轉化成音訊信號(例如使用者語音信號)。使用者語音信號可根據操作402之化身選擇的音訊轉換部分來轉換。經轉換之使用者語音信號相應於化身語音信號。化身語音信號可使用已知用於經由網路傳輸至遠程裝置及/或虛擬空間的技術來編碼。可在操作418處傳輸經轉換及編碼之音訊。操作418可進一步包括傳輸動畫命令及化身參數中之至少一者。傳輸動畫命令係組配來允許遠程裝置藉由根據動畫命令修改化身參數而使本地所顯示化身成動畫。已在傳輸之前根據動畫命令修改的經傳輸化身參數可直接用來使顯示於遠程裝置上的化身成動畫。換言之，由動畫命令表示的對化身參數之修改可在本地執行或遠程執行。 Operation 416 includes converting and encoding the retrieved audio. Captured sound The signal can be converted into an audio signal (such as a user's voice signal). The user's voice signal can be converted according to the audio conversion part selected by the avatar in operation 402. The converted user voice signal corresponds to the avatar voice signal. The avatar voice signal can be encoded using a technology known for transmission to a remote device and/or virtual space via a network. The converted and encoded audio may be transmitted at operation 418. Operation 418 may further include transmitting at least one of animation commands and avatar parameters. The transmission animation command is configured to allow the remote device to animate the locally displayed avatar by modifying the avatar parameters according to the animation command. The transmitted avatar parameters that have been modified according to the animation command before transmission can be directly used to animate the avatar displayed on the remote device. In other words, the modification of the avatar parameters indicated by the animation command can be performed locally or remotely.

操作420包括接收可為經轉換音訊之遠程編碼音訊。操作420進一步包括接收遠程動畫命令及遠程化身參數中之至少一者。遠程動畫命令可用來修改相應於遠程使用者之所顯示化身或本地使用者之所顯示回饋化身的化身參數。動畫命令及化身參數係組配來產生基於使用者輸入加以修改的化身動畫。在操作422處，所接收之音訊可獲解碼及播放，且在操作424處，化身可獲顯示及成動畫。 Operation 420 includes receiving remotely encoded audio, which may be converted audio. Operation 420 further includes receiving at least one of a remote animation command and remote avatar parameters. The remote animation command can be used to modify the avatar parameters corresponding to the remote user's displayed avatar or the local user's displayed feedback avatar. Animation commands and avatar parameters are combined to generate an avatar animation modified based on user input. At operation 422, the received audio can be decoded and played, and at operation 424, the avatar can be displayed and animated.

所顯示化身之動畫可基於所偵測及識別之使用者輸入，如本文所述。在裝置至裝置通訊(例如系統100)之示例中，遠程化身選擇或遠程化身參數中至少一者可接收自遠程裝置。相應於遠程使用者之化身可隨後基於所接收之遠程化身選擇來顯示，且可基於所接收之遠程化身參數而成動畫。在虛擬位置交互作用(例如系統126)之示例中，可接收允許裝置顯示相應於裝置使用者之化身所看見的內容的資訊。 The animation of the displayed avatar can be based on the detected and recognized user input, as described herein. In an example of device-to-device communication (such as the system 100), at least one of remote avatar selection or remote avatar parameters can be received from a remote device. The avatar corresponding to the remote user can then be displayed based on the received remote avatar selection, and can be animated based on the received remote avatar parameters. Example of interaction in a virtual location (e.g. system 126) , Can receive information that allows the device to display content that corresponds to what the device user’s avatar sees.

可在操作426處判定通訊是否完成。若通訊完成，即可在操作428處結束程式流。若通訊未完成，程式流即可繼續進行至操作406，擷取影像、深度及/或音訊。 It can be determined at operation 426 whether the communication is complete. If the communication is completed, the program flow can be ended at operation 428. If the communication is not completed, the program flow can continue to operation 406 to capture images, depth, and/or audio.

雖然圖4例示根據一實施例之各種操作，但是要理解的是，並非圖4中描繪的所有操作皆為其他實施例所必需。事實上，本文完全涵蓋的是，本揭示案之其他實施例、圖4中描繪之操作及/或本文描述之其他操作均可以一方式組合，該組合方式並未明確展示於隨附圖式之任何圖式中，但仍完全與本揭示案一致。因此，針對並未確切展示於一圖式中的特徵及/或操作的請求項被視為屬於本揭示案之範疇及內容。 Although FIG. 4 illustrates various operations according to one embodiment, it should be understood that not all operations depicted in FIG. 4 are necessary for other embodiments. In fact, what this article completely covers is that other embodiments of the present disclosure, the operations depicted in FIG. 4, and/or other operations described herein can all be combined in one way, which is not explicitly shown in the accompanying drawings. In any scheme, it is still completely consistent with this disclosure. Therefore, the requirements for features and/or operations that are not exactly displayed in a drawing are deemed to belong to the scope and content of this disclosure.

如本文中任何實施例所使用，「應用程式(app)」一詞可以代碼或指令體現，該等代碼或指令可在諸如主機處理器的可規劃電路或其他可規劃電路上執行。 As used in any of the embodiments herein, the term "application (app)" can be embodied by codes or instructions, which can be executed on a programmable circuit such as a host processor or other programmable circuits.

如本文中任何實施例所使用，「模組」一詞可代表app、軟體、韌體及/或電路，其組配來執行上述操作中之任何操作。軟體可體現為套裝軟體、記錄於至少一非暫時性電腦可讀儲存媒體上之代碼、指令、指令集及/或資料。韌體可體現為硬編碼(例如非依電性)於記憶體裝置中的代碼、指令或指令集及/或資料。 As used in any of the embodiments herein, the term "module" can refer to apps, software, firmware, and/or circuits, which are configured to perform any of the above operations. The software may be embodied as packaged software, codes, instructions, instruction sets and/or data recorded on at least one non-transitory computer-readable storage medium. The firmware can be embodied as hard-coded (for example, non-electricity) codes, commands or command sets and/or data in the memory device.

如本文中任何實施例所使用，「電路」可包含例如單獨的或呈任何組合的硬連線電路；可規劃電路，諸如包含一或多個單獨指令處理核心之電腦處理器；狀態機電路及/或儲存藉由可規劃電路執行之指令的韌體。模組可共同地或單獨地體現為形成大型系統之部分的電路，例如積體電路(IC)、系統單晶片(SoC)、桌上型電腦、膝上型電腦、平板電腦、伺服器、智慧型電話等等。 As used in any of the embodiments herein, "circuits" can include, for example, hard-wired circuits alone or in any combination; circuits can be programmed, such as A computer processor that includes one or more separate instruction processing cores; a state machine circuit and/or a firmware that stores instructions executed by a programmable circuit. Modules can be collectively or individually embodied as circuits that form part of a large system, such as integrated circuits (IC), system-on-chip (SoC), desktop computers, laptop computers, tablet computers, servers, and intelligence Type phone and so on.

如此所描述之任何操作可實行於包括一或多個儲存媒體之系統中，該等儲存媒體上儲存有單獨的或呈組合的指令，在藉由一或多個處理器執行該等指令時，該等指令執行該等方法。在此，處理器可包括例如伺服器CPU、行動裝置CPU及/或其他可規劃電路。此外，本文描述之操作意欲可跨越多個實體裝置來分散，該等實體裝置諸如處在一個以上不同實體位置處的處理結構。儲存媒體可包括任何類型的有形媒體，例如，任何類型之碟片，包括硬碟、軟碟片、光碟、光碟片-唯讀記憶體(CD-ROM)、可重寫光碟片(CD-RW)及磁光碟；半導體裝置，諸如唯讀記憶體(ROM)、隨機存取記憶體(RAM)(諸如動態及靜態RAM)、可抹除可規劃唯讀記憶體(EPROM)、電氣可抹除可規劃唯讀記憶體(EEPROM)、快閃記憶體、固態碟片(SSD)、磁性或光學卡；或者適合於儲存電子指令的任何類型之媒體。其他實施例可實行為藉由可規劃控制裝置執行之軟體模組。儲存媒體可為非暫時性的。 Any of the operations described in this way can be implemented in a system that includes one or more storage media on which separate or combined instructions are stored. When the instructions are executed by one or more processors, These instructions execute these methods. Here, the processor may include, for example, a server CPU, a mobile device CPU, and/or other programmable circuits. In addition, the operations described herein are intended to be distributed across multiple physical devices, such as processing structures at more than one different physical location. Storage media can include any type of tangible media, for example, any type of discs, including hard disks, floppy discs, optical discs, optical discs-read only memory (CD-ROM), and rewritable discs (CD-RW) ) And magneto-optical disks; semiconductor devices, such as read-only memory (ROM), random access memory (RAM) (such as dynamic and static RAM), erasable programmable read-only memory (EPROM), electrical erasable It can be designed as read-only memory (EEPROM), flash memory, solid state disk (SSD), magnetic or optical card; or any type of media suitable for storing electronic commands. Other embodiments can be implemented as a software module executed by a programmable control device. The storage medium may be non-transitory.

因此，本揭示案提供一種用於使化身交互地成動畫以替代活動影像來進行視訊通訊的方法及系統。與活動影像之發送相比，化身之使用減少要交換的資訊之量。該系統及方法進一步組配來藉由例如音調偏移及/或使所擷取音訊信號時間延長而將使用者語音轉換成化身語音。化身之互動動畫可基於所偵測之使用者輸入，包括觸摸及手勢。互動動畫係組配來修改基於面部偵測及追蹤判定之動畫。 Therefore, the present disclosure provides a method and system for interactively animate an avatar instead of moving images for video communication. Compared with the sending of moving images, the use of avatars reduces the amount of information to be exchanged. The The system and method are further configured to convert the user's voice into an avatar voice by, for example, pitch shifting and/or prolonging the time of the captured audio signal. The interactive animation of the avatar can be based on the detected user input, including touch and gesture. Interactive animation is combined to modify the animation based on face detection and tracking judgment.

根據一態樣，提供一種系統。該系統可包括：使用者輸入裝置，其組配來擷取使用者輸入；通訊模組，其組配來傳輸及接收資訊；以及一或多個儲存媒體。此外，該一或多個儲存媒體上儲存有單獨的或呈組合的指令，在藉由一或多個處理器執行該等指令時產生以下操作，包含：選擇化身；起始通訊；偵測使用者輸入；識別使用者輸入；基於使用者輸入識別動畫命令；產生化身參數；以及傳輸動畫命令及化身參數中之至少一者。 According to one aspect, a system is provided. The system may include: a user input device configured to capture user input; a communication module configured to transmit and receive information; and one or more storage media. In addition, the one or more storage media stores individual or combined instructions, and the following operations are generated when the instructions are executed by one or more processors, including: selecting an avatar; initiating communication; detecting use User input; identifying user input; identifying animation commands based on user input; generating avatar parameters; and transmitting at least one of animation commands and avatar parameters.

另一示例系統包括前述組件且進一步包括：麥克風，其組配來擷取聲音且將所擷取之聲音轉化成相應音訊信號；以及指令，當藉由一或多個處理器執行該等指令時產生以下額外操作：擷取使用者語音且將使用者語音轉化成相應使用者語音信號；將使用者語音信號轉換成化身語音信號；以及傳輸化身語音信號。 Another example system includes the aforementioned components and further includes: a microphone configured to capture sound and convert the captured sound into a corresponding audio signal; and instructions, when the instructions are executed by one or more processors The following additional operations are generated: capturing the user's voice and converting the user's voice into a corresponding user's voice signal; converting the user's voice signal into an avatar voice signal; and transmitting the avatar voice signal.

另一示例系統包括前述組件且進一步包括：攝影機，其組配來擷取影像；以及指令，當藉由一或多個處理器執行該等指令時產生以下額外操作：擷取影像；偵測影像中的臉部；自臉部提取特徵；以及將特徵轉化成化身參數。 Another example system includes the aforementioned components and further includes: a camera configured to capture an image; and instructions, when the instructions are executed by one or more processors, the following additional operations are generated: capture an image; detect an image The face in, extract features from the face, and convert the features into avatar parameters.

另一示例系統包括前述組件且進一步包括：顯示器；以及指令，當藉由一或多個處理器執行該等指令時產生以下額外操作：顯示至少一化身；接收遠程動畫命令及遠程化身參數中之至少一者；以及基於遠程動畫命令及遠程化身參數中之至少一者使一所顯示化身成動畫。 Another example system includes the aforementioned components and further includes: a display; and instructions. When the instructions are executed by one or more processors, the following additional operations are generated: displaying at least one avatar; receiving one of remote animation commands and remote avatar parameters At least one; and animating a displayed avatar based on at least one of a remote animation command and a remote avatar parameter.

另一示例系統包括前述組件且進一步包括：揚聲器，其組配來將音訊信號轉換成聲音；以及指令，當藉由一或多個處理器執行該等指令時產生以下額外操作：接收遠程化身語音信號；以及將遠程化身語音信號轉化成化身語音。 Another example system includes the aforementioned components and further includes: a speaker configured to convert an audio signal into sound; and instructions, when the instructions are executed by one or more processors, the following additional operations are generated: receiving remote avatar voice Signal; and convert the remote avatar voice signal into avatar voice.

另一示例系統包括前述組件，且該使用者輸入裝置為組配來擷取距離之攝影機且該使用者輸入為手勢。 Another example system includes the aforementioned components, and the user input device is a camera configured to capture a distance, and the user input is a gesture.

另一示例系統包括前述組件，且該使用者輸入裝置為觸摸感應顯示器且該使用者輸入為觸摸事件。 Another example system includes the aforementioned components, and the user input device is a touch sensitive display and the user input is a touch event.

另一示例系統包括前述組件，且該轉換包含音調偏移及時間延長中之至少一者。 Another example system includes the aforementioned components, and the conversion includes at least one of pitch shift and time extension.

根據另一態樣，提供一種方法。該方法可包括選擇化身；起始通訊；偵測使用者輸入；識別使用者輸入；基於使用者輸入識別動畫命令；基於動畫命令產生化身參數；及傳輸動畫命令及化身參數中之至少一者。 According to another aspect, a method is provided. The method may include selecting an avatar; initiating communication; detecting user input; recognizing user input; recognizing animation commands based on user input; generating avatar parameters based on the animation commands; and transmitting at least one of the animation commands and avatar parameters.

另一示例方法包括前述操作且進一步包括：擷取使用者語音且將使用者語音轉化成相應使用者語音信號；將使用者語音信號轉換成化身語音信號；以及傳輸化身語音信號。 Another example method includes the foregoing operations and further includes: capturing user voice and converting the user voice into a corresponding user voice signal; converting the user voice signal into an avatar voice signal; and transmitting the avatar voice signal.

另一示例方法包括前述操作且進一步包括：擷取影像；偵測影像中的臉部；自臉部提取特徵；以及將特徵轉化成化身參數。 Another example method includes the foregoing operations and further includes: capturing an image; detecting a face in the image; extracting features from the face; and converting the features into avatar parameters.

另一示例方法包括前述操作且進一步包括：顯示至少一化身；接收遠程動畫命令及遠程化身參數中之至少一者；以及基於遠程動畫命令及遠程化身參數中之至少一者使一所顯示化身成動畫。 Another example method includes the foregoing operations and further includes: displaying at least one avatar; receiving at least one of a remote animation command and a remote avatar parameter; and making a displayed avatar into an avatar based on at least one of the remote animation command and the remote avatar parameter Animation.

另一示例方法包括前述操作且進一步包括：接收遠程化身語音信號；以及將遠程化身語音信號轉化成化身語音。 Another example method includes the foregoing operations and further includes: receiving a remote avatar voice signal; and converting the remote avatar voice signal into an avatar voice.

另一示例方法包括前述操作且該使用者輸入為手勢。 Another example method includes the aforementioned operations and the user input is a gesture.

另一示例方法包括前述操作且該使用者輸入為觸摸事件。 Another example method includes the aforementioned operations and the user input is a touch event.

另一示例方法包括前述操作且該轉換包含音調偏移及時間延長中之至少一者。 Another example method includes the foregoing operations and the conversion includes at least one of pitch shift and time extension.

根據另一態樣，提供一種系統。該系統可包括一或多個儲存媒體，該一或多個儲存媒體上儲存有單獨的或呈組合的指令，在藉由一或多個處理器執行該等指令時產生以下操作，包括選擇化身；起始通訊；偵測使用者輸入；識別使用者輸入；基於使用者輸入識別動畫命令；產生化身參數；以及傳輸動畫命令及化身參數中之至少一者。 According to another aspect, a system is provided. The system may include one or more storage media storing individual or combined instructions on the one or more storage media, and the following operations are generated when the instructions are executed by one or more processors, including selecting an avatar Initiate communication; detect user input; recognize user input; recognize animation commands based on user input; generate avatar parameters; and transmit at least one of animation commands and avatar parameters.

另一示例系統包括指令，當藉由一或多個處理器執行該等指令時產生前述操作，並且亦包括：擷取使用者語音且將使用者語音轉化成相應使用者語音信號；將使用者語音信號轉換成化身語音信號；以及傳輸化身語音信號。 Another example system includes instructions that generate the aforementioned operations when executed by one or more processors, and also includes: fetching users Voice and convert the user's voice into a corresponding user's voice signal; convert the user's voice signal into an avatar voice signal; and transmit the avatar voice signal.

另一示例系統包括指令，當藉由一或多個處理器執行該等指令時產生前述操作，並且亦包括：擷取影像；偵測影像中的臉部；自臉部提取特徵；以及將特徵轉化成化身參數。 Another example system includes instructions that generate the foregoing operations when executed by one or more processors, and also includes: capturing an image; detecting a face in the image; extracting features from the face; and combining the features Converted into avatar parameters.

另一示例系統包括指令，當藉由一或多個處理器執行該等指令時產生前述操作，並且亦包括：顯示至少一化身；接收遠程動畫命令及遠程化身參數中之至少一者；以及基於遠程動畫命令及遠程化身參數中之至少一者使一所顯示化身成動畫。 Another example system includes instructions that generate the foregoing operations when the instructions are executed by one or more processors, and also includes: displaying at least one avatar; receiving at least one of remote animation commands and remote avatar parameters; and based on At least one of the remote animation command and the remote avatar parameter animates a displayed avatar.

另一示例系統包括指令，當藉由一或多個處理器執行該等指令時產生前述操作，並且亦包括：接收遠程化身語音信號；以及將遠程化身語音信號轉化成化身語音。 Another example system includes instructions that generate the aforementioned operations when the instructions are executed by one or more processors, and also includes: receiving a remote avatar voice signal; and converting the remote avatar voice signal into an avatar voice.

另一示例系統包括指令，當藉由一或多個處理器執行該等指令時產生前述操作，並且該使用者輸入為手勢。 Another example system includes instructions that when executed by one or more processors generate the aforementioned operations, and the user input is a gesture.

另一示例系統包括指令，當藉由一或多個處理器執行該等指令時產生前述操作，並且該使用者輸入為觸摸事件。 Another example system includes instructions that generate the aforementioned operations when the instructions are executed by one or more processors, and the user input is a touch event.

另一示例系統包括指令，當藉由一或多個處理器執行該等指令時產生前述操作，並且該轉換包含音調偏移及時間延長中之至少一者。 Another example system includes instructions that when executed by one or more processors produce the aforementioned operations, and the conversion includes at least one of pitch shift and time extension.

本文已使用之用詞及表述係用作描述之用詞且並非限制，且在使用此等用詞及表述時，不欲排除所展示及所描述的特徵之任何等效物(或其部分)，且應認識到，在申請專利範圍之範疇內，可能存在各種修改。因此，申請專利範圍意欲涵蓋所有此類等效物。 The terms and expressions used in this article are used as descriptive terms and are not limiting, and when using these terms and expressions, it is not intended to exclude the display And any equivalents (or parts thereof) of the described features, and it should be recognized that there may be various modifications within the scope of the patent application. Therefore, the scope of patent application is intended to cover all such equivalents.

100:裝置至裝置系統/系統 100: device-to-device system/system

102、112:裝置/遠程裝置 102, 112: device/remote device

104、114:攝影機 104, 114: Camera

106、116:麥克風 106, 116: Microphone

107、117:揚聲器 107, 117: Speaker

110、120:化身 110, 120: Avatar

122:網路 122: Network

124:伺服器 124: Server

Claims

A system including:

A user input device configured to capture user input;

A communication module configured to transmit and receive information; and

One or more storage media stored thereon, individually or in combination, instructions that when executed by one or more processors cause the first device to perform the following operations, which include:

Choose an avatar

Receive at least one image of the user;

Passively cause the avatar to be animatedly presented based at least in part on the at least one image, so as to generate a passively animated avatar for display on a remote device, wherein the passively animated avatar simulates the movement of a user's body part ；

Using the user input device to detect user input;

Determining the interactive animation of the avatar presented by the interactive animation based at least in part on the user input;

Using the interactive animation to modify the avatar presented by the passive animation, so that the avatar presented by the interactive animation is generated for display as a feedback avatar on the first device; and

Transmitting a signal to the remote device, the signal being configured to cause the avatar presented in the interactive animation to be displayed on the remote device;

Wherein the user input device includes a touch screen, and the user input includes touch events on the touch screen Pieces.

The system according to claim 1, which further includes:

A microphone configured to capture sound and convert the captured sound into a corresponding audio signal, wherein the instructions, when executed by one or more processors, cause the following additional operations:

Capturing the user's voice and converting the user's voice into a corresponding user's voice signal;

Transforming the user voice signal into an avatar voice signal; and

Transmitting the avatar voice signal to the remote device.

The system according to claim 1, comprising a camera configured to capture an image, wherein the instructions, when executed by one or more processors, cause the following additional operations:

Capturing the at least one image of the user;

Detecting the face in the at least one image;

Extract features from the face;

Convert the features into avatar parameters, and

The passively animated avatar is generated based at least in part on the avatar parameter.

The system of claim 1, further comprising a display, wherein the instructions, when executed by one or more processors, cause the following additional operations:

Displaying at least one avatar on the display;

Receiving at least one of a remote animation command and a remote avatar parameter; and

Passively causing the at least one avatar to be animatedly presented based on at least one of the remote animation command and the remote avatar parameter.

The system of claim 1, further comprising a speaker configured to convert an audio signal into sound, wherein the instructions, when executed by one or more processors, cause the following additional operations:

Receive remote avatar voice signals; and

Convert the remote avatar voice signal into avatar voice.

The system according to any one of claims 1 to 5, wherein the user input is one or more points on the touch screen mapped to one or more points of the avatar presented by the passive animation Touch events at touch locations.

The system according to claim 2, wherein the transformation includes at least one of transposition and time extension.

A method to be executed by a first device, which includes:

Choose an avatar

Receive at least one image of the user;

Use a user input device to detect user input;

Determining the interactive animation of the avatar presented by the passive animation based at least in part on the user input;

Use the interactive animation to modify the rendering of the passive animation An avatar, such that an avatar presented in an interactive animation is generated to be displayed as a feedback avatar on the first device; and

The user input device includes a touch screen, and the user input includes a touch event on the touch screen.

The method according to claim 8, which further includes:

Transforming the user voice signal into an avatar voice signal; and

Transmitting the avatar voice signal to the remote device.

The method according to claim 8, which further includes:

Capturing the at least one image of the user;

Detecting the face in the at least one image;

Extracting features from the face; converting the features into avatar parameters; and

The method according to claim 8, which further includes:

Display at least one avatar on the display;

The method according to claim 8, which further includes:

Receive remote avatar voice signals; and

Convert the remote avatar voice signal into avatar voice.

The method according to any one of claims 8 to 12, wherein the user input is one or more points on the touch screen mapped to one or more points of the avatar presented by the passive animation Touch events at touch locations.

The method according to claim 9, wherein the transformation includes at least one of transposition and time extension.

A system, which includes one or more storage media, the one or more storage media separately or in combination stores instructions thereon, the instructions when executed by one or more processors cause a first The device performs the following operations, which include:

Choose an avatar

Receive at least one image of the user;

Use a user input device to detect user input;

The interactive animation is used to modify the avatar presented by the passive animation, so that the avatar presented by the interactive animation is generated for the first device To show up as an avatar of feedback; and

The system of claim 15, wherein the instructions, when executed by one or more processors, cause the following additional operations:

Transforming the user voice signal into an avatar voice signal; and

Transmitting the avatar voice signal to the remote device.

Capturing the at least one image of the user;

Detecting the face in the at least one image;

Extract features from the face;

Convert the features into avatar parameters, and

Display at least one avatar on the display;

Passively causing the at least one displayed avatar to be animatedly presented based on at least one of the remote animation command and the remote avatar parameter.

Receive remote avatar voice signals; and

Convert the remote avatar voice signal into avatar voice.

The system according to any one of claims 15 to 19, wherein the user input is one or more points on the touch screen mapped to one or more points of the avatar presented by the passive animation Touch events at touch locations.

The system according to claim 16, wherein the transformation includes at least one of transposition and time extension.

A device to be used by a first device, which includes:

Parts used to select the avatar;

A component for receiving at least one image of the user;

A component for passively causing the avatar to be animated at least partially based on the at least one image so as to generate a passively animated avatar for display on a remote device, wherein the passively animated avatar simulates the body of the user Movement of parts;

A component used to detect user input using a user input device;

Means for determining the interactive animation of the avatar presented by the passive animation based at least in part on the user input;

For using the interactive animation to modify the avatar presented by the passive animation so that the avatar presented by the interactive animation is generated so as to be displayed on the first device The components displayed as the avatar of the feedback; and

A component for transmitting a signal to the remote device, the signal being configured to cause the avatar presented in the interactive animation to be displayed on the remote device;

The device according to claim 22, which further includes:

A component for capturing the user's voice and converting the user's voice into a corresponding user's voice signal;

Means for transforming the user's voice signal into an avatar voice signal; and

A component used to transmit the avatar voice signal to the remote device.

The device according to claim 22, which further includes:

Means for capturing the at least one image of the user;

A component for detecting a face in the at least one image;

Components used to extract features from the face;

Means for converting the features into avatar parameters; and

A component for generating the passively animated avatar based at least in part on the avatar parameter.

The device according to claim 22, which further includes:

A component for displaying at least one avatar on the display;

A component for receiving at least one of a remote animation command and a remote avatar parameter; and

A component for passively causing the at least one avatar to be animatedly presented based on at least one of the remote animation command and the remote avatar parameter.

The device according to claim 22, which further includes:

Components for receiving voice signals of the remote avatar; and

A component used to convert the remote avatar voice signal into an avatar voice.

The device according to any one of claims 22 to 26, wherein the user input is one or more points on the touch screen mapped to one or more points of the avatar presented by the passive animation Touch events at touch locations.

The device according to claim 23, wherein the transformation includes at least one of transposition and time extension.