TWI682669B

TWI682669B - Communication using interactive avatars

Info

Publication number: TWI682669B
Application number: TW107137526A
Authority: TW
Inventors: 童曉芬; 李文龍; 杜楊洲; 胡威; 張益明
Original assignee: 美商英特爾公司
Priority date: 2013-04-08
Filing date: 2013-04-08
Publication date: 2020-01-11
Also published as: TW201924321A

Abstract

Generally this disclosure describes a video communication system that replaces actual live images of the participating users with animated avatars. A method may include selecting an avatar; initiating communication; detecting a user input; identifying the user input; identifying an animation command based on the user input; generating avatar parameters; and transmitting at least one of the animation command and the avatar parameters.

Description

Communication technology using interactive avatars (5)

Field of invention

以下揭示案係有關於視訊通訊，且更特定言之，係有關於使用互動化身的視訊通訊。 The following disclosure is about video communication, and more specifically, about video communication using interactive avatars.

Background of the invention

行動裝置中可利用的功能性種類之增加使得使用者產生對除單純的呼叫之外經由視訊進行通訊的渴望。例如，使用者可起始「視訊呼叫」、「視訊會議」等等，其中裝置中之攝影機及麥克風捕獲使用者之音訊及視訊，該音訊及視訊實時傳輸至一或多個其他接收者，諸如其他行動裝置、桌上型電腦、視訊會議系統等等。視訊之通訊可涉及實質量之資料的傳輸(例如，取決於攝影機之技術、用來處理經擷取影像資料之特定視訊編碼解碼器，等等)。考慮到現存2G/3G無線技術之帶寬限制及新興4G無線技術之仍有限的帶寬，許多裝置使用者進行的並行視訊呼叫可超出在現存無線通訊基礎架構中可利用的帶寬，從而可負面地影響視訊呼叫之品質。 The increase in the variety of functionalities available in mobile devices has created a desire for users to communicate via video in addition to simple calls. For example, the user can initiate a "video call", "video conference", etc., where the camera and microphone in the device capture the user's audio and video, which is transmitted in real time to one or more other recipients, such as Other mobile devices, desktop computers, video conferencing systems, etc. Video communication may involve the transmission of real-quality data (eg, depending on the camera technology, the specific video codec used to process the captured image data, etc.). Considering the bandwidth limitations of existing 2G/3G wireless technology and the still limited bandwidth of emerging 4G wireless technology, many device users can make parallel video calls that exceed the available bandwidth in the existing wireless communication infrastructure, which can negatively affect The quality of video calls.

依據本發明之一實施例，係特地提出一種具有指令儲存其上的一或多個非暫時性電腦可讀儲存裝置，該等指令由一第一計算裝置的至少一處理器執行時，導致進行包含下列的操作：致能一第一化身的選擇；識別出該第一計算裝置之一使用者的一或多個面部特徵；產生要被傳送至一第二計算裝置之資訊，用以致使經選擇之該第一化身要動畫式顯示於該第二計算裝置之一顯示器上，其中，該資訊係基於被識別出的該第一計算裝置之該使用者的該一或多個面部特徵；及基於一使用者輸入命令而致能經選擇之該第一化身的動畫，其中，該使用者輸入命令係有別於該一或多個面部特徵，且當由該第一計算裝置之該使用者控制時，該使用者輸入命令要由一使用者輸入裝置來產生。 According to one embodiment of the present invention, one or more non-transitory computer-readable storage devices with instructions stored thereon are specifically proposed. When the instructions are executed by at least one processor of a first computing device, the It includes the following operations: enabling the selection of a first avatar; identifying one or more facial features of a user of the first computing device; generating information to be transmitted to a second computing device to cause the The selected first avatar is to be animatedly displayed on a display of the second computing device, wherein the information is based on the one or more facial features of the identified user of the first computing device; and The animation of the selected first avatar is enabled based on a user input command, wherein the user input command is different from the one or more facial features, and when the user of the first computing device During control, the user input command is generated by a user input device.

100‧‧‧裝置至裝置系統/系統 100‧‧‧device-to-device system/system

102、112、102’‧‧‧裝置/遠程裝置 102, 112, 102’‧‧‧ device/remote device

104、114、104’/114’‧‧‧攝影機 104, 114, 104’/114’‧‧‧ camera

106、116、106’、116’‧‧‧麥克風 106, 116, 106’, 116’‧‧‧ microphone

107、117‧‧‧揚聲器 107, 117‧‧‧ speaker

108、118‧‧‧觸摸感應顯示器/顯示器 108、118‧‧‧Touch-sensitive display/display

108’、118’‧‧‧顯示器 108’, 118’‧‧‧ monitor

110、120‧‧‧化身 110、120‧‧‧avatar

112’‧‧‧裝置/遠程裝置 112’‧‧‧device/remote device

122‧‧‧網路 122‧‧‧ Internet

124、124’‧‧‧伺服器 124, 124’‧‧‧ Server

126‧‧‧系統 126‧‧‧System

128‧‧‧虛擬空間 128‧‧‧ virtual space

200‧‧‧攝影機、音訊及觸控螢幕框架模組 200‧‧‧Camera, audio and touch screen frame module

202‧‧‧面部偵測及追蹤模組/面部偵測/追蹤模組/臉部偵測模組 202‧‧‧Face detection and tracking module/Face detection/Tracking module/Face detection module

204‧‧‧特徵提取模組 204‧‧‧Feature extraction module

206‧‧‧音訊轉換模組 206‧‧‧Audio conversion module

208‧‧‧觸摸偵測模組 208‧‧‧Touch detection module

210‧‧‧手勢偵測模組 210‧‧‧gesture detection module

212‧‧‧化身選擇模組 212‧‧‧Avatar Selection Module

214‧‧‧化身控制模組 214‧‧‧Avatar Control Module

216‧‧‧系統 216‧‧‧ system

218‧‧‧回饋化身 218‧‧‧Feedback Avatar

220‧‧‧通訊模組 220‧‧‧Communication module

222‧‧‧處理器 222‧‧‧ processor

300、304‧‧‧WiFi連接 300, 304‧‧‧WiFi connection

302‧‧‧網際網路 302‧‧‧Internet

306‧‧‧企業AP 306‧‧‧Enterprise AP

308‧‧‧閘道 308‧‧‧Gateway

310‧‧‧防火牆 310‧‧‧Firewall

312‧‧‧媒體及信號路徑 312‧‧‧Media and signal path

314‧‧‧家AP 314‧‧‧ AP

400‧‧‧流程圖 400‧‧‧Flowchart

402~428‧‧‧操作 402~428‧‧‧Operation

所請求標的之各種實施例的特徵及優點將以下隨詳細說明之進行並於參閱圖式之後而變得明顯，圖示中相同數字指定相同部分，且其中：圖1A例示根據本揭示案之各種實施例的示例裝置至裝置系統；圖1B例示根據本揭示案之各種實施例的示例虛擬空間系統；圖2例示根據本揭示案之各種實施例的示例裝置；圖3例示根據本揭示案之至少一實施例的示例系統實行方案；以及圖4為根據本揭示案之至少一實施例的示例操作的流程圖。 The features and advantages of the various embodiments of the claimed subject matter will be apparent below as the detailed description proceeds and after referring to the drawings. The same numbers in the drawings designate the same parts, and among them: FIG. 1A illustrates various kinds according to the present disclosure. Example device-to-device system of an embodiment; FIG. 1B illustrates an example virtual space system according to various embodiments of the present disclosure; FIG. 2 illustrates an example device according to various embodiments of the present disclosure; FIG. 3 illustrates at least at least one according to the present disclosure. An example system implementation scheme of an embodiment; and FIG. 4 is a flowchart of example operations according to at least one embodiment of the present disclosure.

雖然以下詳細說明係參考說明性實施例來進行，但是熟習此項技術者將明白該等實施例之許多替代例、修改形式及變化形式。 Although the following detailed description is made with reference to illustrative embodiments, those skilled in the art will appreciate many alternatives, modifications, and variations of these embodiments.

Detailed description of the preferred embodiment

通常，本揭示案描述使用互動化身來視訊通訊的系統及方法。與活動影像相對，使用化身大體上減少將要傳輸之資料量，且因此化身通訊需要較少的帶寬。互動化身係組配來藉由基於使用者輸入改變所選化身之顯示而增強使用者體驗。此外，使用者語音可獲擷取及轉換來產生化身語音。化身語音可隨後與使用者語音有關，但可遮掩使用者之身份。音訊轉換可包括例如音調偏移及/或時間延長。 Generally, this disclosure describes systems and methods for using interactive avatars for video communication. In contrast to moving images, using avatars generally reduces the amount of data to be transmitted, and therefore avatar communication requires less bandwidth. Interactive avatars are configured to enhance the user experience by changing the display of the selected avatar based on user input. In addition, user voice can be captured and converted to generate avatar voice. The avatar voice can then be related to the user's voice, but can obscure the user's identity. Audio conversion may include, for example, pitch shifting and/or time extension.

在一實施例中，啟動耦接至攝影機、麥克風及揚聲器之裝置中的應用程式。該應用程式可組配來允許使用者選擇用於顯示於遠程裝置上、虛擬空間中等等之化身。裝置可隨後組配來起始與至少一其他裝置、虛擬空間等等的通訊。例如，通訊可經由2G、3G、4G蜂巢式連接來建立。或者或另外，通訊可經由網際網路，經由WiFi連接來建立。通訊建立之後，攝影機可組配來開始擷取影像及/或離物體的距離，且麥克風可組配來開始擷取聲音，例如使用者語音，且將使用者語音轉化成使用者語音信號。 In one embodiment, the application in the device coupled to the camera, microphone, and speaker is launched. The application can be configured to allow users to select avatars for display on remote devices, in virtual spaces, etc. The devices can then be configured to initiate communication with at least one other device, virtual space, etc. For example, communication can be established via 2G, 3G, 4G cellular connections. Alternatively or additionally, communication can be established via the Internet via a WiFi connection. After the communication is established, the camera can be configured to start capturing images and/or the distance to the object, and the microphone can be configured to start capturing sounds, such as user voice, and convert the user voice into a user voice signal.

隨後可判定是否偵測到使用者輸入。使用者輸入可藉由使用者輸入設備擷取。使用者輸入包括藉由觸摸感應顯示器所擷取之觸摸事件及藉由攝影機所擷取的手勢，該攝影機例如組配來擷取離物體之距離的深度攝影機，及/或web攝影機。因此，使用者輸入裝置包括觸摸感應顯示器及/或攝影機。若偵測到使用者輸入，則可識別使用者輸入。對於觸摸事件，使用者輸入識別符可與觸摸類型及一或多個觸摸位置有關。對於手勢(例如張開手)而言，使用者輸入識別符可與手勢識別符有關。動畫命令可隨後基於使用者輸入來識別。動畫命令相應於與使用者輸入相關聯的所要響應，例如響應於所顯示化身之外觀上的單次輕觸而改變所顯示化身外觀的顏色。 It can then be determined whether user input is detected. User input can be captured by user input devices. User input includes touch events captured by a touch-sensitive display and gestures captured by a camera, for example, a depth camera configured to capture the distance from an object, and/or a web camera. Therefore, the user input device includes a touch-sensitive display and/or a camera. If user input is detected, the user input can be recognized. For touch events, the user input identifier may be related to the touch type and one or more touch locations. For gestures (such as an open hand), the user input identifier may be related to the gesture identifier. The animation commands can then be recognized based on user input. The animation command corresponds to the desired response associated with the user input, such as changing the color of the appearance of the displayed avatar in response to a single tap on the appearance of the displayed avatar.

隨後可產生化身參數。化身參數可基於面部偵測、頭部移動及/或動畫命令來產生。化身參數可因此包括基於例如面部偵測及頭部移動的被動組件，及基於動畫命令的互動組件。化身參數可用於使化身於至少一其他裝置上、於虛擬空間內等等成動畫。在一實施例中，化身參數可基於面部偵測、頭部移動及動畫命令來產生。在該實施例中，所得動畫包括基於面部偵測及頭部移動的被動動畫，其藉由基於動畫命令的互動動畫來修改。因此，化身動畫可包括基於例如面部偵測及頭部移動的被動動畫，及基於使用者輸入的互動動畫。 The avatar parameters can then be generated. Avatar parameters can be generated based on facial detection, head movement, and/or animation commands. The avatar parameters may therefore include passive components based on, for example, face detection and head movement, and interactive components based on animation commands. The avatar parameters can be used to animate the avatar on at least one other device, in a virtual space, etc. In one embodiment, avatar parameters may be generated based on face detection, head movement, and animation commands. In this embodiment, the resulting animation includes passive animation based on face detection and head movement, which is modified by interactive animation based on animation commands. Therefore, avatar animations may include passive animations based on, for example, face detection and head movement, and interactive animations based on user input.

可隨後傳輸動畫命令及化身參數中之至少一者。在一實施例中，接收遠程動畫命令及遠程化身參數中之至少一者。該遠程動畫命令可使裝置基於遠程動畫命令來判定化身參數以便使所顯示化身成動畫。遠程化身參數可使裝置基於所接收之遠程化身參數來使所顯示化身成動畫。 At least one of the animation command and the avatar parameters can then be transmitted. In one embodiment, at least one of remote animation commands and remote avatar parameters is received. The remote animation command may cause the device to determine avatar parameters based on the remote animation command in order to animate the displayed avatar. The remote avatar parameter may cause the device to animate the displayed avatar based on the received remote avatar parameter.

音訊通訊可伴隨化身動畫。通訊建立之後，麥克風可組配來擷取音訊輸入(聲音)，例如使用者語音，且將所擷取的聲音轉化成相應音訊信號(例如使用者語音信號)。在一實施例中，使用者語音信號可轉換成化身語音信號，其可隨後獲編碼及傳輸。所接收之化身語音信號可隨後藉由揚聲器轉化回聲音(例如化身語音)。化身語音可因此基於使用者語音且可保存內容但可改變與所擷取語音相關聯之頻譜資料。例如，轉換包括但不限於音調偏移時間延長及/或轉化回放率。 Audio communication can be accompanied by avatar animation. After the communication is established, the microphone can be configured to capture audio input (sound), such as user voice, and convert the captured sound into a corresponding audio signal (such as user voice signal). In one embodiment, the user's voice signal can be converted into an avatar voice signal, which can then be encoded and transmitted. The received avatar voice signal can then be converted back to sound (eg avatar voice) through the speaker. The avatar voice can therefore be based on the user's voice and can save the content but can change the spectrum data associated with the captured voice. For example, conversion includes but is not limited to pitch shift time extension and/or conversion playback rate.

使用者輸入裝置(例如觸摸感應顯示器及/或攝影機)可組配來擷取使用者輸入，該等使用者輸入係組配來基於至少一其他裝置上之使用者輸入來使化身成動畫。使用者驅動之動畫(基於動畫命令)可另外為基於面部表情及/或頭部移動之動畫。動畫命令可包括但不限於化身顯示之定位改變、面部特徵扭曲、改變特徵來傳達情緒等等。動畫命令可因此修改與基於面部偵測/追蹤之動畫類似的化身動畫及/或除基於面部偵測/追蹤之動畫之外修改化身動畫。動畫命令可產生時間有限之動畫且可基於來自遠程使用者的輸入而使所得動畫例示於本地使用者之所顯示化身上。 User input devices (such as touch-sensitive displays and/or cameras) can be configured to capture user input, and these user inputs are configured to animate avatars based on user input on at least one other device. User-driven animations (based on animation commands) may additionally be animations based on facial expressions and/or head movements. Animation commands may include, but are not limited to, changes in positioning of the avatar display, distortion of facial features, changing features to convey emotions, and so on. The animation commands can therefore modify avatar animations similar to those based on face detection/tracking and/or modify avatar animations in addition to those based on face detection/tracking. The animation commands can generate animations with limited time and can instantiate the resulting animations on the display of the local user based on input from a remote user.

因此，有限帶寬視訊通訊系統可使用化身來實行。音訊可加以轉換且視訊可基於所偵測之使用者輸入及所識別之動畫命令而成動畫，從而利用化身通訊來增強使用者體驗。此外，可使用化身保存匿名性，包括如本文所述之音訊轉換。 Therefore, limited bandwidth video communication systems can be implemented using avatars. The audio can be converted and the video can be animated based on the detected user input and the recognized animation commands, thereby using avatar communication to enhance the user experience. In addition, avatars can be used to preserve anonymity, including audio conversion as described in this article.

圖1A例示與本揭示案之各種實施例一致的裝置至裝置系統100。系統100可通常包括經由網路122通訊之裝置102及112。裝置102至少包括攝影機104、麥克風106、揚聲器107及觸摸感應顯示器108。裝置112至少包括攝影機114、麥克風116、揚聲器117及觸摸感應顯示器118。網路122至少包括伺服器124。 FIG. 1A illustrates a device-to-device system 100 consistent with various embodiments of the present disclosure. System 100 may generally include devices 102 and 112 that communicate via network 122. The device 102 includes at least a camera 104, a microphone 106, a speaker 107, and a touch-sensitive display 108. The device 112 includes at least a camera 114, a microphone 116, a speaker 117, and a touch-sensitive display 118. The network 122 includes at least a server 124.

裝置102及112可包括能夠有線通訊及/或無線通訊之各種硬體平台。例如，裝置102及112可包括但不限於視訊會議系統、桌上型電腦、膝上型電腦、平板電腦、智慧型電話(例如，iPhones®、基於Android的電話、Blackberries®、基於Symbian®的電話、基於Palm®的電話等等)、蜂巢式手機等等。攝影機104及114包括用於擷取代表包括一或多個人的環境的數位影像之任何裝置，且可具有足夠解析度以用於如本文所述之外觀分析及/或手勢識別。例如，攝影機104及114可包括靜物攝影機(例如，組配來擷取靜止照片之攝影機)或視訊攝影機(例如，組配來擷取由多個訊框組成之移動影像的攝影機)。攝影機104及114可組配來使用可見光譜中的光或利用電磁譜中不限於紅外光譜、紫外光譜等等之其他部分的光來操作。在一實施例中，攝影機104及114可組配來偵測深度，亦即，攝影機離物體及/或該物體上之點的距離。攝影機104及114可分別併入裝置102及112中，或可為組配來與經由有線通訊或無線通訊而與裝置102及112通訊的獨立裝置。攝影機104及114之特定實例可包括有線(例如，通用串列匯流排(USB)、乙太網路、火線等等)或無線(例如，WiFi、藍牙等等)web攝影機，如其可與電腦、視訊監視器等等相關聯；深度攝影機；行動裝置攝影機(例如，手機或智慧型電話攝影機，其例如整合至先前論述之示例裝置中)；整合式膝上型電腦攝影機；整合式平板電腦攝影機(例如，iPad®、Galaxy Tab®及類似攝影機)等等。 The devices 102 and 112 may include various hardware platforms capable of wired communication and/or wireless communication. For example, devices 102 and 112 may include, but are not limited to, video conferencing systems, desktop computers, laptop computers, tablet computers, smart phones (eg, iPhones®, Android-based phones, Blackberries®, Symbian®-based phones , Palm®-based phones, etc.), cellular phones, etc. The cameras 104 and 114 include any device for capturing digital images of the environment including one or more people, and may have sufficient resolution for appearance analysis and/or gesture recognition as described herein. For example, the cameras 104 and 114 may include still-life cameras (for example, cameras configured to capture still photos) or video cameras (for example, cameras configured to capture moving images composed of multiple frames). The cameras 104 and 114 may be configured to operate using light in the visible spectrum or using light in other parts of the electromagnetic spectrum that are not limited to the infrared spectrum, ultraviolet spectrum, and so on. In one embodiment, cameras 104 and 114 can be configured to detect depth, that is, the distance of the camera from the object and/or points on the object. The cameras 104 and 114 may be incorporated into the devices 102 and 112, respectively, or may be separate devices that are configured to communicate with the devices 102 and 112 via wired or wireless communication. Specific examples of the cameras 104 and 114 may include wired (for example, universal serial bus (USB), Ethernet, FireWire, etc.) or wireless (for example, WiFi, Bluetooth, etc.) web cameras, such as those that can communicate with computers, Video monitors, etc. are associated; depth camera; mobile device camera (eg, cell phone or smart phone camera, for example integrated into the example device previously discussed); integrated laptop camera; integrated tablet camera ( For example, iPad®, Galaxy Tab® and similar cameras) and so on.

裝置102及112可進一步包含麥克風106及116及揚聲器107及117。麥克風106及116包括任何裝置，其組配來感測(亦即，擷取)聲音且將感測之聲音轉化成相應音訊信號。麥克風106及116可分別整合於裝置102及112內部，或可經由有線通訊或無線通訊與該等裝置交互作用，諸如上文關於攝影機104及114之實例中所述。揚聲器107及117包括任何裝置，其組配來將音訊信號轉化成相應聲音。揚聲器107及117可分別整合於裝置102及112內部，或可經由有線通訊或無線通訊與該等裝置交互作用，諸如上文關於攝影機104及114之實例中所述。觸摸感應顯示器108及118包括任何裝置，該等裝置係組配來顯示文字、靜止影像、移動影像(例如視訊)、使用者介面、圖形等等且係組配來感測諸如輕觸、重擊等等之觸摸事件。觸摸事件可包括觸摸類型及觸摸位置。觸摸感應顯示器108及118可分別整合於裝置102及112內部，或可經由有線通訊或無線通訊與該等裝置交互作用，諸如上文關於攝影機104及114之實例中所述。在一實施例中，顯示器108及118係組配來分別顯示化身110及120。如本文所提及，化身係定義為使用者於二維(2D)或三維(3D)中的圖形表示。化身不必類似於使用者之面容，且因此雖然化身可為逼真的表示，但該等化身還可以採取圖畫、卡通、草圖等等的形式。在系統100中，裝置102可顯示表示裝置112之使用者(例如遠程使用者)的化身110，且同樣地，裝置112可顯示表示裝置102之使用者的化身120。以此方式，使用者可看見其他使用者之表示，而不必交換涉及使用活動影像的裝置至裝置通訊的大量資訊。此外，化身可基於使用者輸入而成動畫。以此方式，使用者可與本地及/或遠程化身之顯示器交互作用，進而增強使用者體驗。所得動畫可提供相比於可能僅使用面部偵測及追蹤時更大範圍的動畫。此外，使用者可主動選擇該等動畫。 Devices 102 and 112 may further include microphones 106 and 116 and speakers 107 and 117. The microphones 106 and 116 include any device configured to sense (ie, capture) sound and convert the sensed sound into corresponding audio signals. Microphones 106 and 116 may be integrated within devices 102 and 112, respectively, or may interact with these devices via wired or wireless communication, such as described in the examples above with cameras 104 and 114. The speakers 107 and 117 include any devices that are configured to convert audio signals into corresponding sounds. The speakers 107 and 117 may be integrated inside the devices 102 and 112, respectively, or may interact with these devices via wired or wireless communication, such as described in the examples above with respect to the cameras 104 and 114. The touch-sensitive displays 108 and 118 include any devices that are configured to display text, still images, moving images (such as video), user interfaces, graphics, etc. and are configured to sense such things as taps, swipes, etc. Wait for touch events. Touch events may include touch type and touch location. Touch-sensitive display The devices 108 and 118 may be integrated within the devices 102 and 112, respectively, or may interact with these devices via wired or wireless communication, such as described in the examples above with cameras 104 and 114. In one embodiment, displays 108 and 118 are configured to display avatars 110 and 120, respectively. As mentioned herein, an avatar is defined as a graphical representation of a user in two dimensions (2D) or three dimensions (3D). Avatars do not have to resemble the face of the user, and so although avatars can be realistic representations, they can also take the form of drawings, cartoons, sketches, etc. In system 100, device 102 may display an avatar 110 representing a user of device 112 (eg, a remote user), and likewise, device 112 may display an avatar 120 representing a user of device 102. In this way, users can see the representations of other users without having to exchange large amounts of information related to device-to-device communication using moving images. In addition, the avatar can be animated based on user input. In this way, the user can interact with the display of the local and/or remote avatar, thereby enhancing the user experience. The resulting animation can provide a larger range of animation than when it is possible to use only face detection and tracking. In addition, users can actively choose these animations.

如本文所提及，化身音訊(亦即聲音)係定義為經轉換之使用者音訊(聲音)。例如，聲音輸入可包括使用者之聲音，亦即，使用者語音，且相應化身音訊可包括經轉換之使用者語音。化身音訊可與使用者音訊有關。例如，化身語音可相應於音調偏移、時間延長及/或使用者語音之其他轉換。化身語音可類似於人類語音或可相應於卡通人物等等。在系統100中，裝置102可發出表示裝置112之遠程使用者的化身音訊，且類似地，裝置112可發出表示藉由裝置102擷取之音訊的化身音訊(例如，裝置102之本地使用者的語音)。以此方式，使用者可聽到可經轉換的其他使用者之聲音的表示。 As mentioned herein, avatar audio (ie, sound) is defined as converted user audio (sound). For example, the voice input may include the user's voice, that is, the user's voice, and the corresponding avatar audio may include the converted user's voice. Avatar audio can be related to user audio. For example, the avatar voice may correspond to pitch shift, time extension, and/or other conversion of user voice. The avatar speech may be similar to human speech or may correspond to cartoon characters and so on. In the system 100, the device 102 may emit an avatar audio representing a remote user of the device 112, and similarly, the device 112 may emit a The avatar audio of the audio captured by the device 102 (for example, the voice of the local user of the device 102). In this way, the user can hear representations of other users' voices that can be converted.

網路122可包括各種第二代(2G)、第三代(3G)、第四代(4G)基於蜂巢式的資料通訊技術、Wi-Fi無線資料通訊技術等等。網路122包括至少一伺服器124，該伺服器組配來在使用此等技術時建立並保持通訊連接。例如，伺服器124可組配來支援網際網路有關的通訊協定，如對話啟動協定(SIP)，其用於建立、修改及終止兩方(單播)及多方(多播)對話；互動連接性建立協定(ICE)，其用於呈現允許協定建立於位元串流連接之頂端的框架；網路存取轉換器(NAT)對話穿越實用機制協定(STUN)，其允許經由NAT操作之應用程式，以便發現其他NAT之存在；IP位址及埠，其經分配用於應用程式之使用者資料報協定(UDP)連接以便連接至遠程主機；使用中繼穿越NAT(TURN)，其允許NAT或防火牆背後之元件經由傳輸控制協定(TCP)或UDP連接來接收資料，等等。 The network 122 may include various second-generation (2G), third-generation (3G), and fourth-generation (4G) cellular-based data communication technologies, Wi-Fi wireless data communication technologies, and so on. The network 122 includes at least one server 124 configured to establish and maintain a communication connection when using these technologies. For example, the server 124 can be configured to support Internet-related communication protocols, such as the Session Initiation Protocol (SIP), which is used to establish, modify, and terminate two-party (unicast) and multi-party (multicast) conversations; interactive connections Established Protocol (ICE), which is used to present a framework that allows protocols to be established on top of bitstream connections; Network Access Translator (NAT) Conversational Utilities Protocol (STUN), which allows applications operated via NAT Program to discover the existence of other NATs; IP addresses and ports, which are allocated for user datagram protocol (UDP) connections of applications to connect to remote hosts; use relay to traverse NAT (TURN), which allows NAT Or the components behind the firewall receive data via Transmission Control Protocol (TCP) or UDP connections, etc.

圖1B例示與本揭示案之各種實施例一致的虛擬空間系統126。系統126可使用裝置102、裝置112及伺服器124。裝置102、裝置112及伺服器124可繼續以與圖1A中所例示相似的方式來通訊，但可在虛擬空間128中發生使用者交互作用替代以裝置至裝置格式發生使用者交互作用。如本文所提及，虛擬空間可定義為實體位置之數位模擬。例如，虛擬空間128可類似於戶外位置，如同城市、道路、人行道、田野、森林、島嶼等等，或戶內位置，如同辦公室、房屋、學校、商場、商店等等。由化身表示之使用者可與現實世界中一樣看起來與虛擬空間128交互作用。虛擬空間128可存在於與網際網路耦接之一或多個伺服器上，且可藉由第三方保持。虛擬空間之實例包括虛擬辦公室、虛擬會議室、如同Second Life®之虛擬世界、如同World of Warcraft®之大規模多人線上角色扮演遊戲(MMORPG)、如同The Sims Online®之大規模多人線上真實生活遊戲(MMORLG)等等。在系統126，虛擬空間128可含有多個對應於不同使用者之化身。替代所顯示化身，顯示器108及118可顯示包封(例如較小)型式之虛擬空間(VS)128。例如，顯示器108可顯示對應於裝置102之使用者的化身在虛擬空間128所「看見」內容的透視圖。類似地，顯示器118可顯示對應於裝置112之使用者的化身在虛擬空間128所「看見」內容的透視圖。化身可能在虛擬空間128看見的內容之實例包括但不限於虛擬結構(例如建築物)、虛擬車輛、虛擬物體、虛擬動物、其他化身等等。 FIG. 1B illustrates a virtual space system 126 consistent with various embodiments of the present disclosure. System 126 may use device 102, device 112, and server 124. Device 102, device 112, and server 124 may continue to communicate in a manner similar to that illustrated in FIG. 1A, but user interaction may occur in virtual space 128 instead of user interaction in a device-to-device format. As mentioned herein, the virtual space can be defined as a digital simulation of the physical location. For example, the virtual space 128 may be similar to an outdoor location, like a city, Roads, sidewalks, fields, forests, islands, etc., or indoor locations, like offices, houses, schools, shopping malls, shops, etc. The user represented by the avatar may appear to interact with the virtual space 128 as in the real world. The virtual space 128 may exist on one or more servers coupled to the Internet, and may be maintained by a third party. Examples of virtual spaces include virtual offices, virtual meeting rooms, virtual worlds like Second Life®, massive multiplayer online role-playing games (MMORPG) like World of Warcraft®, massive multiplayer online reality like The Sims Online® Life games (MMORLG) and so on. In the system 126, the virtual space 128 may contain multiple avatars corresponding to different users. Instead of the displayed avatars, the displays 108 and 118 may display a virtual space (VS) 128 in an enclosed (eg, smaller) style. For example, the display 108 may display a perspective view of what the avatar of the user of the device 102 "sees" in the virtual space 128. Similarly, the display 118 may display a perspective view of what the avatar of the user of the device 112 "sees" in the virtual space 128. Examples of content that an avatar may see in the virtual space 128 include, but are not limited to, virtual structures (such as buildings), virtual vehicles, virtual objects, virtual animals, other avatars, and so on.

圖2例示根據本揭示案之各種實施例的示例裝置102。雖然僅描述裝置102，但是裝置112(例如遠程裝置)可包括組配來提供相同或類似功能之資源。如先前所論述，裝置102展示為包括攝影機104、麥克風106、揚聲器107及觸摸感應顯示器108。攝影機104、麥克風106及觸摸感應顯示器108可對攝影機、音訊及觸控螢幕框架模組200提供輸入，且攝影機、音訊及觸控螢幕框架模組200 可提供對揚聲器107之輸出(例如音訊信號)。攝影機、音訊及觸控螢幕框架模組200可包括慣用的、專屬的、已知的及/或以後開發的音訊及視訊處理碼(或指令集)，該音訊及視訊處理碼通常經明確界定且可操作來至少控制攝影機104、麥克風106、揚聲器107及觸摸感應顯示器108。例如，攝影機、音訊及觸控螢幕框架模組200可使攝影機104、麥克風106、揚聲器107及觸摸感應顯示器108記錄影像、離物體之距離、聲音及/或觸摸，可處理影像、聲音、音訊信號及/或觸摸，可使影像及/或聲音獲複製，可對揚聲器107提供音訊信號，等等。攝影機、音訊及觸控螢幕框架模組200可取決於裝置102，且更尤其取決於裝置102中運作之作業系統(OS)而變化。示例作業系統包括iOS®、Android®、Blackberry® OS、Symbian®、Palm® OS，等等。揚聲器107可接收來自攝影機、音訊及觸控螢幕框架模組200之音訊資訊，且可組配來複製本地聲音(例如，以便提供使用者聲音之音訊回饋，該音訊回饋經轉換或未經轉換)及遠程聲音(例如，於虛擬位置中參與電話、視訊呼叫或進行交互作用的另一方或多方之聲音(經轉換或未經轉換))。 FIG. 2 illustrates an example device 102 according to various embodiments of the present disclosure. Although only the device 102 is described, the device 112 (eg, a remote device) may include resources configured to provide the same or similar functions. As previously discussed, device 102 is shown to include camera 104, microphone 106, speaker 107, and touch-sensitive display 108. The camera 104, microphone 106, and touch-sensitive display 108 can provide input to the camera, audio, and touch screen frame module 200, and the camera, audio, and touch screen frame module 200 The output to the speaker 107 (eg, audio signal) can be provided. The camera, audio, and touch screen frame module 200 may include conventional, proprietary, known, and/or later developed audio and video processing codes (or instruction sets). The audio and video processing codes are usually clearly defined and It is operable to control at least the camera 104, the microphone 106, the speaker 107, and the touch-sensitive display 108. For example, the camera, audio, and touch screen frame module 200 enables the camera 104, microphone 106, speaker 107, and touch-sensitive display 108 to record images, distance to objects, sound, and/or touch, and can process images, sound, and audio signals And/or touch, the image and/or sound can be copied, the audio signal can be provided to the speaker 107, and so on. The camera, audio and touch screen frame module 200 may vary depending on the device 102, and more particularly depending on the operating system (OS) operating in the device 102. Example operating systems include iOS®, Android®, Blackberry® OS, Symbian®, Palm® OS, etc. The speaker 107 can receive audio information from the camera, audio, and touch screen frame module 200, and can be configured to copy local sound (for example, to provide audio feedback of the user's voice, which is converted or unconverted) And remote voice (for example, voice (converted or unconverted) of another party or parties involved in a phone, video call, or interaction in a virtual location).

面部偵測及追蹤模組202可組配來識別及追蹤藉由攝影機104提供的影像內的頭部、臉部及/或面部區。例如，面部偵測模組204可包括慣用的、專屬的、已知的及/或以後開發的臉部偵測碼(或指令集)、硬體及/或韌體，其通常經明確界定且可操作來接收標準格式影像(例如但不限於RGB彩色影像)且至少在某種程度上識別影像中的臉部。面部偵測及追蹤模組202亦可組配來經由一系列影像(例如處於每秒24個訊框下的視訊訊框)追蹤所偵測之臉部且基於所偵測之臉部判定頭部位置。可藉由面部偵測/追蹤模組202使用的已知追蹤系統可包括粒子濾波、平均變動、卡爾曼濾波等等，其中每一者皆可利用邊緣分析、平方和方差分析、特徵點分析、直方圖分析、膚色分析等等。 The face detection and tracking module 202 can be configured to identify and track the head, face, and/or face area within the image provided by the camera 104. For example, the face detection module 204 may include conventional, proprietary, known, and/or later developed face detection codes (or instruction sets), hardware, and/or firmware, which are usually clearly defined and Operable to receive standard format images (such as but not limited to RGB color images) and at least to some extent identify Face. The face detection and tracking module 202 can also be configured to track the detected face through a series of images (such as a video frame at 24 frames per second) and determine the head based on the detected face position. Known tracking systems that can be used by the face detection/tracking module 202 can include particle filtering, average variation, Kalman filtering, etc., each of which can use edge analysis, square and variance analysis, feature point analysis, Histogram analysis, skin color analysis, etc.

特徵提取模組204可組配來辨識藉由臉部偵測模組202偵測之臉部中的特徵(例如，面部指標(諸如眼睛、眉毛、鼻、嘴等等)之位置及/或形狀)。在一實施例中，化身動畫可直接地基於所感測之面部動作(例如面部特徵之改變)無需進行面部表情識別。化身臉部上之相應特徵點可遵循或模仿真實人的臉部的移動，此稱為「表情仿製」或「表演驅動的面部動畫」。特徵提取模組204可包括慣用的、專屬的、已知的及/或以後開發的面部特性辨識碼(或指令集)，其通常經明確界定且可操作來接收來自攝影機104之標準格式影像(例如但不限於RGB彩色影像)且至少在某程度上提取影像中的一或多個面部特性。此等已知面部特徵系統包括但不限於Colorado State University的CSU臉部識別評價系統。 The feature extraction module 204 can be configured to recognize the position and/or shape of features (eg, facial indicators (such as eyes, eyebrows, nose, mouth, etc.) in the face detected by the face detection module 202 ). In one embodiment, the avatar animation may be directly based on the sensed facial movements (eg, changes in facial features) without facial expression recognition. The corresponding feature points on the face of the avatar can follow or imitate the movement of the face of the real person. This is called "expression imitation" or "performance-driven facial animation". The feature extraction module 204 may include conventional, proprietary, known, and/or later developed facial feature identification codes (or instruction sets), which are usually well-defined and operable to receive standard format images from the camera 104 ( For example, but not limited to RGB color images, and at least to some extent extract one or more facial features in the images. Such known facial feature systems include, but are not limited to, Colorado State University's CSU facial recognition evaluation system.

特徵提取模組204亦可組配來辨識與所偵測之特徵相關聯的表情(例如，識別先前偵測的臉部是否高興、悲哀、微笑、皺眉頭、驚訝、興奮等等))。因此，特徵提取模組204可進一步包括慣用的、專屬的、已知的及/或以後開發的面部表情偵測及/或標識碼(或指令集)，其通常經明確界定且可操作來偵測及/或識別臉部中的表情。例如，特徵提取模組204可判定面部特徵(例如眼睛、嘴、頰、牙齒等等)之大小及/或位置，且可將此等面部特徵與面部特徵資料庫比較，該面部特徵資料庫包括具有相應面部特徵類別(例如、微笑、皺眉頭、興奮、悲哀等等)的多個樣本面部特徵。 The feature extraction module 204 can also be configured to recognize the expressions associated with the detected features (eg, whether the previously detected face is happy, sad, smiling, frowning, surprised, excited, etc.). Therefore, the feature extraction module 204 may further include conventional, proprietary, known and/or later developed facial expression detection and/or identification codes (or instruction sets), which are usually Definitely defined and operable to detect and/or recognize facial expressions. For example, the feature extraction module 204 can determine the size and/or position of facial features (such as eyes, mouth, cheeks, teeth, etc.), and can compare these facial features with a facial feature database, which includes Multiple sample facial features with corresponding facial feature categories (eg, smile, frown, excitement, sadness, etc.).

音訊轉換模組206係組配來將使用者之聲音轉換成化身聲音，亦即，經轉換的使用者之聲音。轉換包括調整節奏(例如延長時間)、音調(例如音調偏移)及回放率。例如，音訊轉換模組206可包括慣用的、專屬的、已知的及/或以後開發的音訊轉換碼(或指令集)，其通常經明確界定且可操作來接收表示使用者之聲音的聲音資料，且將該等聲音資料轉化成經轉換的聲音資料。聲音資料可與基於藉由麥克風106擷取且藉由攝影機、音訊及觸控螢幕框架模組200處理的聲音的音訊信號有關。此類已知聲音轉換系統包括但不限於聲控開啟式資源音訊處理庫，其係組配來調整音訊串流或音訊檔案之節奏、音調及回放率。 The audio conversion module 206 is configured to convert the user's voice into an avatar voice, that is, the converted user's voice. The conversion includes adjusting the rhythm (eg, extended time), pitch (eg, pitch shift), and playback rate. For example, the audio conversion module 206 may include conventional, proprietary, known, and/or later developed audio conversion codes (or instruction sets), which are usually well-defined and operable to receive sounds representing the user's voice Data, and convert the sound data into converted sound data. The sound data may be related to the audio signal based on the sound captured by the microphone 106 and processed by the camera, audio, and touch screen frame module 200. Such known sound conversion systems include, but are not limited to, voice-controlled open source audio processing libraries, which are configured to adjust the rhythm, pitch, and playback rate of audio streams or audio files.

音訊轉換模組206可包括多個預定義聲音風格，其相應於與轉換使用者之聲音相關聯的轉換參數。例如，轉換參數可組配來以不同音調及/或節奏保持人聽到的經轉換聲音輸出。對人類女性或如兒童的聲音而言，音調可偏移至高頻率；對人類男性的聲音而言，音調可偏移至較低頻率，可向上或向下調整節奏以便增大或減小語音之速度，等等。在另一實例中，該等轉換參數可組配來產生相應於如動物的聲音(例如貓)及/或卡通人物類聲音的經轉換聲音輸出。此可藉由調整使用者語音之音調、其他頻率分量及/或取樣參數來達成。 The audio conversion module 206 may include a plurality of predefined sound styles, which correspond to conversion parameters associated with the conversion of the user's voice. For example, the conversion parameters can be configured to maintain the converted sound output heard by humans in different tones and/or rhythms. For the voice of human females or children, the pitch can be shifted to high frequencies; for the voice of human males, the pitch can be shifted to lower frequencies, and the rhythm can be adjusted up or down to increase or decrease the voice Speed, etc. In another example, the conversion parameters can be combined to produce The converted sound output corresponding to sounds such as animal sounds (eg cats) and/or cartoon characters. This can be achieved by adjusting the pitch, other frequency components and/or sampling parameters of the user's speech.

使用者可於起始通訊之前選擇所要音訊轉換輸出，及/或可在通訊期間選擇所要音訊轉換。音訊轉換模組206可組配來提供響應於來自使用者之請求的樣本音訊轉換輸出。在一實施例中，音訊轉換模組206可包括允許使用者選擇音訊轉換參數來產生客製音訊轉換輸出的設施。該設施可組配來基於使用者之聲音輸入提供樣本經轉換音訊輸出。使用者可隨後調整音訊轉換參數(例如，嘗試錯誤法)直至達成適合的轉換輸出。與對使用者之適合輸出相關聯的音訊轉換參數可隨後儲存及/或利用來進行化身通訊，如本文所述。 The user can select the desired audio conversion output before starting the communication, and/or can select the desired audio conversion during the communication. The audio conversion module 206 can be configured to provide a sample audio conversion output in response to a request from a user. In one embodiment, the audio conversion module 206 may include facilities that allow the user to select audio conversion parameters to generate a customized audio conversion output. The facility can be configured to provide sampled converted audio output based on the user's voice input. The user can then adjust the audio conversion parameters (eg, trial and error) until a suitable conversion output is achieved. The audio conversion parameters associated with the appropriate output to the user can then be stored and/or utilized for avatar communication, as described herein.

觸摸偵測模組208係組配來接收來自攝影機、音訊及觸控螢幕框架模組200之觸摸資料且基於所接收之觸摸資料識別觸摸事件。觸摸事件識別符可包括觸摸類型及/或觸摸位置。觸摸類型可包括單一輕觸、雙重輕觸、輕觸及保持、輕觸及移動、按壓及延展、重擊等等。觸摸位置可包括觸摸開始位置、觸摸結束位置及/或中間移動觸摸位置等等。觸摸位置可相應於觸摸感應顯示器108之坐標。觸摸偵測模組208可包括慣用的、專屬的、已知的及/或以後開發的觸摸偵測碼(或指令集)，其通常經明確界定且可操作來接收觸摸資料且識別觸摸事件。 The touch detection module 208 is configured to receive touch data from the camera, audio, and touch screen frame module 200 and recognize touch events based on the received touch data. The touch event identifier may include touch type and/or touch location. Touch types may include single touch, double touch, touch and hold, touch and move, press and extend, swipe, and so on. The touch position may include a touch start position, a touch end position, and/or an intermediate moving touch position, and so on. The touch position may correspond to the coordinates of the touch-sensitive display 108. The touch detection module 208 may include conventional, proprietary, known, and/or later developed touch detection codes (or instruction sets), which are generally well-defined and operable to receive touch data and recognize touch events.

手勢偵測模組210係組配來接收來自攝影機、音訊及觸控螢幕框架模組200的深度及/或影像資料，基於所接收之深度及/或影像資料辨識相應手勢，且基於所辨識之手勢判定手勢識別符。深度相應於攝影機至物體之距離。手勢識別符與所辨識之手勢有關。手勢偵測模組210可包括慣用的、專屬的、已知的及/或以後開發的手勢偵測碼(或指令集)，其通常經明確界定且可操作來基於所接收之深度及/或影像資料識別手勢。 The gesture detection module 210 is configured to receive depth and/or image data from the camera, audio and touch screen frame module 200, recognize corresponding gestures based on the received depth and/or image data, and based on the recognized Gesture determination gesture identifier. The depth corresponds to the distance from the camera to the object. The gesture identifier is related to the recognized gesture. The gesture detection module 210 may include conventional, proprietary, known, and/or later developed gesture detection codes (or instruction sets), which are generally well-defined and operable based on the received depth and/or Image data recognizes gestures.

例如，手勢偵測模組210可包括預定義手勢之資料庫。該預定義手勢可包括至少一些相對普通、相對簡單的手勢，包括張開手、合緊手(亦即，握拳)、揮手、用手做圓周運動、手自右至左移動、手自左至右移動等等。因此，手勢可包括靜態非移動的手手勢、活動移動的手手勢及/或其組合。在一實施例中，手勢偵測模組210可包括訓練設施，其組配來允許使用者改變預定義手勢及/或訓練新手勢。客製手勢及/或新手勢可隨後與手勢識別符相關聯，且該手勢識別符可與動畫命令相關聯，如本文所述。例如，使用者可選擇動畫命令以與來自動畫命令之預定義清單中的手勢相關聯。 For example, the gesture detection module 210 may include a database of predefined gestures. The predefined gestures may include at least some relatively ordinary and relatively simple gestures, including open hand, close hand (ie, fist), wave hand, circular motion with hand, hand movement from right to left, hand movement from left to Move right and so on. Accordingly, gestures may include static non-moving hand gestures, active moving hand gestures, and/or combinations thereof. In one embodiment, the gesture detection module 210 may include a training facility that is configured to allow the user to change predefined gestures and/or train new gestures. The customized gesture and/or the new gesture can then be associated with a gesture identifier, and the gesture identifier can be associated with an animation command, as described herein. For example, the user can select an animation command to associate with a gesture from a predefined list of animation commands.

因此，動畫命令與對使用者輸入之所要響應有關。動畫命令可與例如觸摸事件識別符及/或手勢識別符之所識別使用者輸入相關聯。以此方式，使用者可與所顯示化身交互作用及/或可設定手勢以便修改所顯示化身之動畫。 Therefore, animation commands are related to the desired response to user input. The animation commands may be associated with recognized user input such as touch event identifiers and/or gesture identifiers. In this way, the user can interact with the displayed avatar and/or can set gestures to modify the animation of the displayed avatar.

化身選擇模組212係組配來允許裝置102之使用者選擇用於在遠程裝置上顯示之化身。化身選擇模組212可包括慣用的、專屬的、已知的及/或以後開發的使用者介面構建碼(或指令集)，其通常經明確界定且可操作來向使用者呈現不同化身，以便該使用者可選擇該等化身之一。在一實施例中，一或多個化身可預定義於裝置102中。預定義化身允許所有裝置具有相同化身，且在交互作用期間僅化身之選擇(例如預定義化身之識別)需要與遠程裝置或虛擬空間通訊，從而減少需要交換的資訊之量。化身係於建立通訊之前選擇，但亦可在主動通訊過程中加以改變。因此，可能於通訊期間任何點處發送或接收化身選擇，且接收裝置可能根據所接收之化身選擇來改變所顯示化身。 The avatar selection module 212 is configured to allow the user of the device 102 to select an avatar for display on the remote device. The avatar selection module 212 may include idiomatic, proprietary, known, and/or later developed user interface building codes (or instruction sets), which are usually clearly defined and operable to present different avatars to the user, so that The user can select one of these avatars. In an embodiment, one or more avatars may be predefined in the device 102. Predefined avatars allow all devices to have the same avatar, and only the selection of avatars (such as the identification of predefined avatars) needs to communicate with remote devices or virtual spaces during interaction, thereby reducing the amount of information that needs to be exchanged. The avatar is selected before establishing communication, but it can also be changed during active communication. Therefore, the avatar selection may be sent or received at any point during the communication, and the receiving device may change the displayed avatar according to the received avatar selection.

化身控制模組214係組配來接收基於裝置102之使用者輸入的使用者輸入識別符。使用者輸入識別符可包括藉由觸摸偵測模組208基於觸摸事件資料所判定之觸摸事件識別符或藉由手勢偵測模組210所判定之手勢識別符。觸摸事件資料包括觸摸類型及/或觸摸位置。觸摸位置可相應於與觸摸感應顯示器108相關聯的坐標。觸摸位置可對映至所顯示化身上之一或多個點，例如對映至一特徵，例如鼻尖、嘴、唇、耳朵、眼睛等等。所顯示化身上之點可與化身動畫之所要響應(亦即動畫命令)有關。 The avatar control module 214 is configured to receive a user input identifier based on user input of the device 102. The user input identifier may include a touch event identifier determined by the touch detection module 208 based on the touch event data or a gesture identifier determined by the gesture detection module 210. The touch event data includes touch type and/or touch location. The touch location may correspond to the coordinates associated with touch-sensitive display 108. The touch position can be mapped to one or more points on the displayed body, for example, to a feature, such as the tip of the nose, mouth, lips, ears, eyes, and so on. The points on the displayed avatar can be related to the desired response of the avatar animation (that is, the animation command).

化身控制模組214係組配來基於使用者輸入識別符(亦即所識別之使用者輸入)判定動畫命令。動畫命令係組配來識別所要化身動畫。例如，所要動畫包括改變所顯示化身之臉部的顏色，改變所顯示化身之一特徵的大小(例如使鼻更大)、使眼色、眨眼、微笑，移除一特徵(例如耳朵)等等。因此，化身控制模組214係組配來接收使用者輸入識別符且基於該使用者輸入識別符判定動畫命令。 The avatar control module 214 is configured to determine the animation command based on the user input identifier (that is, the recognized user input). Animation commands are grouped to identify the desired avatar animation. For example, the desired animation includes changing the color of the face of the displayed avatar, changing the size of one of the features of the displayed avatar (for example, making the nose larger), making winks, blinking, smiling, removing a feature (for example, ears), and so on. Therefore, the avatar control module 214 is configured to receive the user input identifier and determine the animation command based on the user input identifier.

化身控制模組214係組配來基於動畫命令實行化身動畫。在一實施例中，對例如裝置112之遠程裝置上顯示的互動動畫而言，可傳輸動畫命令且遠程化身控制模組可隨後實行該動畫。在另一實施例中，該等化身參數可經傳輸組配用於化身動畫之立即實行方案。 The avatar control module 214 is configured to perform avatar animation based on animation commands. In one embodiment, for interactive animations displayed on a remote device such as device 112, animation commands can be transmitted and the remote avatar control module can then execute the animation. In another embodiment, the avatar parameters can be configured for immediate implementation of the avatar animation via transmission.

基於動畫命令的所實行之互動動畫可具有有限持續時間，在該有限持續時間之後，該化身動畫可回到基於例如如本文所述之面部偵測及追蹤的被動動畫。影響特徵之大小的所實行之互動動畫可組配來逐漸改變大小且逐漸回到初始大小。另外或替代地，影響特徵之大小的動畫可組配來具有一效果梯度。換言之，大小改變的相對量值可取決於相對於例如關鍵頂點之位置。所顯示化身上更接近關鍵頂點之點可經歷比所顯示化身上相對更遠的點更大的改變。 The implemented interactive animation based on the animation command may have a limited duration, after which the avatar animation may return to a passive animation based on, for example, face detection and tracking as described herein. The implemented interactive animations that affect the size of the feature can be combined to gradually change the size and gradually return to the original size. Additionally or alternatively, animations that affect the size of features can be assembled to have an effect gradient. In other words, the relative magnitude of the size change may depend on the position relative to, for example, key vertices. The points on the displayed chemistry that are closer to the key vertices may undergo greater changes than the points on the displayed chemistry that are relatively further away.

因此，化身控制模組214可接收基於使用者輸入的使用者輸入識別符，可基於該使用者輸入識別符判定動畫命令且可基於該動畫命令實行動畫。基於動畫命令之互動動畫可為時間受限於一時間週期(持續時間)的及/或可包括效果梯度。動畫可在該時間週期之後回到基於面部偵測及追蹤的被動化身動畫。 Therefore, the avatar control module 214 may receive a user input identifier based on the user input, may determine an animation command based on the user input identifier, and may perform animation based on the animation command. Interactive animations based on animation commands may be time limited to a time period (duration) and/or may include effect gradients. The animation can return to the passive avatar animation based on face detection and tracking after this time period.

化身控制模組214係組配來產生用於使化身成動畫之參數。如本文所提及，動畫可定義為改變影像/模型之外觀。動畫包括基於例如面部表情及/或頭部移動的被動動畫及基於使用者輸入的互動動畫。單一動畫(可包括被動動畫及互動動畫)可改變二維靜止影像之外觀，或多個動畫可依次存在以模擬影像之運動(例如轉頭、點頭、眨眼、交談、皺眉頭、微笑、大笑、使眼色、眨眼等等)。用於三維模型之動畫的實例包括使三維線框模型變形、應用紋理對映及重新計算用於正常顯現之模型頂點。所偵測之臉部及/或所提取之面部特徵之位置的改變可轉化成使化身之特徵類似於使用者臉部之特徵的參數。在一實施例中，所偵測之臉部的一般表情可轉化成使化身顯示相同表情的一或多個參數。化身之表情亦可予以誇示以強調該表情。當化身參數可通常應用於所有預定義化身時，對所選擇之化身的認識可並非必需。然而，在一實施例中，化身參數可對所選擇之化身為特定的，且因此可在選擇另一化身之情況下加以改變。例如，人類化身可需要與動物化身、卡通化身等等不同的參數設置(例如，可改變不同化身特徵)來演示如高興、悲哀、生氣、驚訝等等之情緒。 The avatar control module 214 is configured to generate parameters for animating the avatar. As mentioned in this article, animation can be defined as changing the appearance of an image/model. Animations include passive animations based on, for example, facial expressions and/or head movements, and interactive animations based on user input. A single animation (which can include passive animation and interactive animation) can change the appearance of two-dimensional still images, or multiple animations can exist in sequence to simulate the motion of the image (such as turning head, nodding, blinking, talking, frowning, smiling, laughing) , Wink, blink, etc.). Examples of animations used for 3D models include deforming 3D wireframe models, applying texture mapping and recalculating model vertices for normal visualization. Changes in the position of the detected face and/or the extracted facial features can be converted into parameters that make the features of the avatar similar to the features of the user's face. In one embodiment, the general expression of the detected face may be converted into one or more parameters that cause the avatar to display the same expression. The avatar's expression can also be exaggerated to emphasize the expression. While avatar parameters can be generally applied to all predefined avatars, knowledge of the selected avatar may not be necessary. However, in an embodiment, the avatar parameters may be specific to the selected avatar, and thus may be changed if another avatar is selected. For example, human avatars may require different parameter settings than animal avatars, cartoon avatars, etc. (eg, different avatar characteristics may be changed) to demonstrate emotions such as happiness, sadness, anger, surprise, etc.

化身控制模組214可包括慣用的、專屬的、已知的及/或以後開發的圖形處理碼(或指令集)，其通常經明確界定且可操作來使藉由化身選擇模組212基於面部偵測及追蹤模組202所偵測的臉部/頭部位置、特徵提取模組204所偵測的面部特徵及/或觸摸偵測模組208及/或手勢偵測模組210所判定的使用者輸入識別符而選擇的化身成動畫。對基於面部特徵之動畫方法而言，二維化身動畫可例如用影像翹曲或影像漸變來完成，而三維化身動畫可用自由形式變形(FFD)或藉由利用頭部三維模型中定義之動畫結構來完成。Oddcast為可用於二維化身動畫之軟體資源的一實例，而FaceGen為可用於三維化身動畫之軟體資源的一實例。 The avatar control module 214 may include conventional, proprietary, known, and/or later developed graphics processing codes (or instruction sets), which are usually well-defined and operable to make the face based on the face by the avatar selection module 212 The face/head position detected by the detection and tracking module 202, the facial features detected by the feature extraction module 204 and/or determined by the touch detection module 208 and/or gesture detection module 210 The avatar selected by the user inputting the identifier becomes an animation. For animation methods based on facial features, 2D avatar animations can be accomplished, for example, by image warping or image gradation, while 3D avatar animations can be free-form deformed (FFD) or by using animation structures defined in the head 3D model To be done. Oddcast is an example of software resources that can be used for two-dimensional avatar animation, and FaceGen is an example of software resources that can be used for three-dimensional avatar animation.

例如，對包括拉長三維化身顯示之鼻部的互動動畫而言，可定義(例如選擇)與鼻尖有關的關鍵頂點v _k。相關聯三維運動向量d _k(dx、dy、dz)及效果半徑R可定義用於關鍵頂點v _k。效果半徑R內之其他頂點可在互動動畫中改變(亦即移動)，而效果半徑R外部之頂點可保持不因互動動畫而改變。互動動畫可具有相關聯之持續時間，即動畫時間T，其可延伸達多個訊框。暫時效果參數η_t可基於時間t及動畫時間T來定義，如：

For example, for an interactive animation that includes the nose of the elongated three-dimensional avatar display, a key vertex v _k related to the tip of the nose can be defined (eg, selected). The associated three-dimensional motion vector d _k (dx, dy, dz) and the effect radius R can be defined for the key vertex v _k . The other vertices within the effect radius R can be changed (that is, moved) in the interactive animation, while the vertices outside the effect radius R can remain unchanged due to the interactive animation. The interactive animation can have an associated duration, namely the animation time T, which can extend up to multiple frames. The temporary effect parameter η _t can be defined based on time t and animation time T, such as:

效果半徑R內相對更接近v _k之頂點可比相對更遠離關鍵頂點v _k之頂點相對更大地改變。一頂點v _i之空間效果參數η_i可定義為：

Vertices within the effect radius R that are relatively closer to v _k may change relatively more than vertices that are relatively further away from the key vertex v _k . A vertex of the spatial effect parameter η _i v _i can be defined as:

而頂點v _i在時間t的運動向量

則可定義為

=η _t．η _i．

。互動動畫化身之新坐標則為

=

+

，其中

相應於頂點v _i之坐標，其基於面部偵測及追蹤，亦即，被動動畫。 And the motion vector of vertex v _i at time t

Can be defined as

= η _t . η _i .

. The new coordinates of the interactive animation avatar are

=

+

,among them

Corresponding to the coordinates of the vertex v _i , it is based on face detection and tracking, that is, passive animation.

因此，可對包括修改互動動畫之被動動畫的所顯示化身實行動畫。互動動畫可受限於總體持續時間，且該動畫之效果的量值可在該持續時間內變化。互動動畫可組配來僅影響化身的一部分，且該等效果可對較接近關鍵頂點之點而言較大。互動動畫完成之後，動畫可基於如本文所述的面部偵測及追蹤來繼續。 Therefore, animation can be performed on the displayed avatars including passive animations that modify interactive animations. Interactive animations can be limited to the overall duration, and the magnitude of the animation's effects can vary within that duration. Interactive animations can be assembled to affect only a part of the avatar, and these effects can be larger for points closer to the key vertices. After the interactive animation is completed, the animation can continue based on face detection and tracking as described herein.

此外，在系統100中，化身控制模組214可接收遠程化身選擇及可用於顯示相應於遠程裝置處之使用者的化身並使其成動畫的遠程化身參數。動畫可包括被動動畫以及互動動畫。化身控制模組可使顯示模組216於顯示器108上顯示化身110。顯示模組216可包括慣用的、專屬的、已知的及/或以後開發的圖形處理碼(或指令集)，其通常經明確界定且可操作來根據示例裝置至裝置實施例在顯示器108上顯示化身且使其呈動畫。例如，化身控制模組214可接收遠程化身選擇且可使該遠程化身選擇解譯以相應於預定化身。顯示模組216可隨後在顯示器108上顯示化身110。此外，化身控制模組214中所接收的遠程化身參數可獲解譯，且可將命令提供至顯示器模組216以使化身110成動畫。在一實施例中，兩個以上的使用者可參與視訊呼叫。當兩個以上的使用者在視訊呼叫中交互作用時，顯示器108可分割或分段以允許一個以上的相應於遠程使用者之化身將同時顯示或者，在系統126中，化身控制模組214可接收資訊，該資訊使顯示器模組216顯示相應於裝置102 之使用者的化身在虛擬空間128中(例如，自該化身之虛擬透視角度)「看見」的內容。例如，顯示器108可顯示虛擬空間128中表示的建築物、物體、動物，其他化身，等等。 In addition, in the system 100, the avatar control module 214 can receive remote avatar selections and can be used to display and animate remote avatar parameters corresponding to the user's avatar at the remote device. The animation may include passive animation and interactive animation. The avatar control module enables the display module 216 to display the avatar 110 on the display 108. The display module 216 may include conventional, proprietary, known, and/or later developed graphics processing codes (or instruction sets), which are generally well-defined and operable to be on the display 108 according to example device-to-device embodiments Show avatars and animate them. For example, the avatar control module 214 can receive a remote avatar selection and can interpret the remote avatar selection to correspond to a predetermined avatar. The display module 216 can then display the avatar 110 on the display 108. In addition, the remote avatar parameters received in the avatar control module 214 can be interpreted, and commands can be provided to the display module 216 to animate the avatar 110. In one embodiment, more than two users can participate in the video call. When more than two users interact in a video call, the display 108 can be split or segmented to allow more than one avatar corresponding to the remote user to be displayed simultaneously. Or, in the system 126, the avatar control module 214 can Receiving information that causes the display module 216 to display content that the user's avatar corresponding to the device 102 "sees" in the virtual space 128 (eg, from the virtual perspective of the avatar). For example, the display 108 may display buildings, objects, animals, other avatars represented in the virtual space 128, and so on.

在一實施例中，化身控制模組214可組配來使顯示器模組216顯示「回饋」化身218。回饋化身218表示所選擇化身如何呈現在遠程裝置上、在虛擬位置中等等。詳言之，回饋化身218呈現為藉由使用者選擇之化身且可使用藉由化身控制模組214產生的相同參數來成動畫。以此方式，使用者可確認遠程使用者在其交互作用期間看見之內容。回饋化身218亦可用來顯示藉由裝置112之遠程使用者輸入所引起的互動動畫。因此，本地使用者可與其回饋化身(例如化身218及裝置102之使用者)交互作用，以使其相關聯化身之互動動畫顯示至裝置112上的遠程使用者。本地使用者可類似地與遠程使用者之所顯示化身(例如化身110)交互作用，從而使遠程使用者之回饋化身的互動動畫顯示於裝置112上。 In one embodiment, the avatar control module 214 can be configured to cause the display module 216 to display the "feedback" avatar 218. The feedback avatar 218 represents how the selected avatar is presented on the remote device, in a virtual location, etc. In detail, the feedback avatar 218 is presented as an avatar selected by the user and can be animated using the same parameters generated by the avatar control module 214. In this way, the user can confirm what the remote user saw during their interaction. The feedback avatar 218 can also be used to display interactive animations caused by remote user input of the device 112. Therefore, local users can interact with their feedback avatars (such as the avatar 218 and the user of the device 102) so that interactive animations of their associated avatars are displayed to remote users on the device 112. The local user can similarly interact with the displayed avatar of the remote user (for example, the avatar 110), so that the interactive animation of the remote user's feedback avatar is displayed on the device 112.

通訊模組220係組配來傳輸及接收資訊以用於選擇化身、顯示化身、使化身成動畫、顯示虛擬位置透視圖等等。通訊模組220可包括慣用的、專屬的、已知的及/或以後開發的通訊處理碼(或指令集)，其通常經明確界定且可操作來傳輸化身選擇、化身參數、動畫命令、互動化身參數及接收遠程化身選擇、遠程化身參數、遠程動畫命令及遠程互動化身參數。通訊模組220亦可傳輸及接收相應於基於化身之交互作用的音訊資訊。通訊模組220可經由網路122傳輸及接收以上資訊，如先前所述。 The communication module 220 is configured to transmit and receive information for selecting avatars, displaying avatars, making avatars animated, displaying perspective views of virtual positions, and so on. The communication module 220 may include conventional, proprietary, known, and/or later developed communication processing codes (or instruction sets), which are usually clearly defined and operable to transmit avatar selections, avatar parameters, animation commands, interactions Avatar parameters and receive remote avatar selection, remote avatar parameters, remote animation commands and remote interactive avatar parameters. The communication module 220 can also transmit and receive audio information corresponding to avatar-based interactions. The communication module 220 can transmit and receive the above information via the network 122, as previously described.

處理器222係組配來執行與裝置102及其中所包括模組的一或多者相關聯之操作。 The processor 222 is configured to perform operations associated with the device 102 and one or more of the modules included therein.

圖3例示根據至少一實施例之示例系統實行方案。裝置102'係組配來經由WiFi連接300來無線地通訊(例如在工作時)，伺服器124'係組配來經由網際網路302協商裝置102'與112'之間的連接，且裝置112'係組配來經由另一WiFi連接304來無線地通訊(例如在家時)。在一實施例中，基於裝置至裝置化身之視訊呼叫應用程式在裝置102'中啟動。在化身選擇之後，應用程式可允許選擇至少一遠程裝置(例如裝置112')。應用程式可隨後使裝置102'起始與裝置112'之通訊。通訊可以裝置102'經由企業存取點(AP)306傳輸連接建立請求至裝置112'來起始。企業AP 306可為可用於商業設置之AP，且因此可支援比家AP 314高的資料通量及更多的並行無線客戶端。企業AP 306可接收來自裝置102'之無線信號，且可經由各種商用網路，經由閘道308進行對連接建立請求的傳輸。連接建立請求可隨後通過防火牆310，該防火牆可組配來控制流入及流出WiFi網路300之資訊。 FIG. 3 illustrates an example system implementation scheme according to at least one embodiment. The device 102' is configured to communicate wirelessly via the WiFi connection 300 (eg, at work), the server 124' is configured to negotiate the connection between the devices 102' and 112' via the Internet 302, and the device 112 It is configured to communicate wirelessly via another WiFi connection 304 (eg at home). In one embodiment, the video calling application based on the device-to-device avatar is launched in the device 102'. After the avatar selection, the application may allow selection of at least one remote device (eg, device 112'). The application can then cause device 102' to initiate communication with device 112'. The communication may be initiated by the device 102' transmitting an establishment request via the enterprise access point (AP) 306 to the device 112'. The enterprise AP 306 may be an AP that can be used in a commercial setting, and thus can support higher data throughput and more parallel wireless clients than the home AP 314. The enterprise AP 306 can receive the wireless signal from the device 102', and can transmit the connection establishment request through the gateway 308 through various commercial networks. The connection establishment request can then pass through the firewall 310, which can be configured to control information flowing into and out of the WiFi network 300.

裝置102'之連接建立請求可隨後藉由伺服器124'處理。伺服器124'可組配來登記IP位址、鑑別目的地位址及NAT穿越，以便連接建立請求可導向網際網路302上的正確目的地。例如，伺服器124'可自接收自裝置102的連接建立請求中的資訊來解析所欲之目的地(例如遠程裝置112')，且可將信號安排路由傳遞穿過正確NAT、埠及因此到達目的地IP位址。此等操作可僅必須在連接建立期間執行，此取決於網路組態。在一些情況下，可在視訊呼叫期間重複操作以便向NAT提供通知來保持連接有效。媒體及信號路徑312可在已建立連接之後將視訊(例如化身選擇及/或化身參數)及音訊資訊指導攜帶至家AP 314。裝置112'可隨後接收連接建立請求且可組配來判定是否接受該請求。判定是否接受該請求可包括例如向查詢關於是否接收來自裝置102'之連接請求的裝置112'之使用者呈現視覺敘事。裝置112'之使用者接收該連接(例如，接收該視訊呼叫)，即可建立該連接。攝影機104'及114'可組配來隨後開始分別擷取裝置102'及112'之各自使用者的影像，以用於是藉由各使用者選擇之化身成動畫。麥克風106'及116'可組配來隨後開始擷取來自各使用者之音訊。當在裝置102'及112'之間開始資訊交換時，顯示器108'及118'可顯示相應於裝置102'及112'之使用者的化身且使該等化身成動畫。 The connection establishment request of the device 102' can then be processed by the server 124'. The server 124' can be configured to register an IP address, authenticate the destination address, and NAT traversal so that the connection establishment request can be directed to the correct destination on the Internet 302. For example, the server 124' can parse the desired destination (e.g., remote device 112') from the information received in the connection establishment request from the device 102, and can route the signal through the correct NAT, port, and thus arrive Destination IP address. These operations can only be performed during connection establishment, depending on the network configuration. In some cases, operations can be repeated during a video call to provide notification to NAT to keep the connection valid. The media and signal path 312 can carry video (eg, avatar selection and/or avatar parameters) and audio information guidance to the home AP 314 after the connection has been established. The device 112' may then receive the connection establishment request and may be assembled to determine whether to accept the request. Determining whether to accept the request may include, for example, presenting a visual narrative to a user of device 112' who inquires about whether to receive a connection request from device 102'. The user of the device 112' receives the connection (for example, receives the video call) to establish the connection. The cameras 104' and 114' can be configured to subsequently start capturing the images of the respective users of the devices 102' and 112', respectively, for use in the avatar animation selected by each user. The microphones 106' and 116' can be combined to start capturing audio from each user. When information exchange starts between the devices 102' and 112', the displays 108' and 118' can display the avatars of the users corresponding to the devices 102' and 112' and animate those avatars.

圖4例示與本揭示案之一實施例一致的示範性操作的流程圖400。該等操作可例如藉由裝置102及/或112執行。詳言之，流程圖400描繪組配來實行化身動畫(包括被動動畫及/或互動動畫)及/或音訊轉換以用於裝置之間經由網路的通訊的操作。假定面部偵測及追蹤、特徵提取及被動化身動畫如本文所述加以實行及操作。 FIG. 4 illustrates a flowchart 400 of exemplary operations consistent with one embodiment of the present disclosure. Such operations may be performed by the device 102 and/or 112, for example. In detail, the flowchart 400 depicts an operation to implement avatar animation (including passive animation and/or interactive animation) and/or audio conversion for communication between devices via a network. It is assumed that face detection and tracking, feature extraction, and passive avatar animation are implemented and operated as described herein.

化身模型可在操作402選擇。化身模型可包括視訊化身選擇及音訊轉換選擇。可顯示多個視訊化身模型，使用者可自該等視訊化身模型選擇一所要化身。在一實施例中，選擇視訊化身模型可包括相關聯音訊轉換。例如，如貓的化身可與如貓的音訊轉換相關聯。在另一實施例中，音訊轉換可獨立於該視訊化身選擇來選擇。 The avatar model may be selected in operation 402. The avatar model may include video avatar selection and audio conversion selection. Multiple video avatar models can be displayed, and the user can select a desired avatar from these video avatar models. In one embodiment, selecting the video avatar model may include associated audio conversion. For example, a cat-like avatar may be associated with a cat-like audio conversion. In another embodiment, audio conversion can be selected independently of the video avatar selection.

包括音訊轉換之化身模型可在啟動通訊之前選擇，但亦可在活動通訊的過程中加以改變。因此，可能於通訊期間任何點處發送或接收化身選擇及/或改變音訊轉換選擇，且接收裝置可能根據所接收之化身選擇來改變所顯示化身。 The avatar model including the audio conversion can be selected before starting the communication, but it can also be changed during the active communication. Therefore, the avatar selection may be sent or received and/or the audio conversion selection may be changed at any point during the communication, and the receiving device may change the displayed avatar according to the received avatar selection.

化身通訊可在操作404啟動。例如，使用者可運行組配來使用如本文所述化身傳達音訊及視訊之應用程式。操作404可包括組配通訊及建立連接。通訊組態包括識別參與視訊呼叫之至少一遠程裝置或虛擬空間。例如，使用者可自儲存於應用程式內、儲存於與另一系統相關聯的裝置內(例如智慧型電話、手機等等中的聯絡人清單)、遠程儲存於諸如網際網路(例如，如Facebook、LinkedIn、Yahoo、Google+、MSN等等的社交媒體網站)上的的遠程使用者/裝置之清單中進行選擇。或者，使用者可選擇在如Second Life的虛擬空間中進行線上操作。 The avatar communication can be initiated in operation 404. For example, a user can run an application that is configured to use the avatar to communicate audio and video as described herein. Operation 404 may include grouping communications and establishing connections. The communication configuration includes identifying at least one remote device or virtual space participating in the video call. For example, a user can store it in an application, a device associated with another system (such as a list of contacts in a smart phone, mobile phone, etc.), or a remote storage such as the Internet (for example, such as Choose from a list of remote users/devices on social media sites such as Facebook, LinkedIn, Yahoo, Google+, MSN, etc.). Alternatively, users can choose to perform online operations in a virtual space such as Second Life.

在操作406，裝置中之攝影機可隨後開始擷取影像及/或深度，且裝置中之麥克風可開始擷取聲音。影像可為靜止影像或活動影像(例如，依次擷取的多個影像)。深度可與影像一起擷取或可獨立地擷取。深度相應於攝影機之視場中攝影機至物體(及物體上之點)的距離。可在操作408 判定是否偵測到使用者輸入。使用者輸入包括藉由影像及/或深度攝影機擷取的手勢及在觸摸感應顯示器上偵測到之觸摸輸入。若偵測到使用者輸入，則可在操作410識別使用者輸入。使用者輸入識別符包括觸摸識別符或手勢識別符。觸摸識別符可基於對觸摸感應顯示器的觸摸來判定且可包括觸摸類型及觸摸位置。手勢識別符可基於所擷取影像及/或深度資料來判定且可包括辨識手勢。 In operation 406, the camera in the device may then start capturing images and/or depth, and the microphone in the device may start capturing sound. The image may be a still image or a moving image (for example, multiple images captured in sequence). The depth can be captured together with the image or can be captured independently. The depth corresponds to the distance from the camera to the object (and the point on the object) in the camera's field of view. It can be determined in operation 408 whether user input is detected. User input includes gestures captured by the image and/or depth camera and touch input detected on the touch-sensitive display. If a user input is detected, the user input can be identified in operation 410. The user input identifier includes a touch identifier or a gesture identifier. The touch identifier may be determined based on the touch on the touch-sensitive display and may include the touch type and the touch position. The gesture identifier may be determined based on the captured image and/or depth data and may include recognizing gestures.

可在操作412識別動畫命令。動畫命令可組配來使顯示於遠程裝置上的使用者之所選擇化身成動畫，或使亦顯示於遠程使用者之裝置上的遠程使用者之回饋化身成動畫。動畫命令相應於與使用者輸入相關聯的所要響應。例如，觸摸所顯示化身的臉部(使用者輸入)可產生所顯示化身的臉部之顏色改變(藉由動畫命令識別的所要響應)。動畫命令可基於所識別之使用者輸入來識別。例如，各使用者輸入可與具有使用者輸入識別符及動畫命令之資料庫中的動畫命令有關(例如與之相關聯)。 The animation command may be recognized at operation 412. The animation commands can be configured to animate the selected avatar of the user displayed on the remote device, or animate the feedback of the remote user also displayed on the remote user's device. The animation commands correspond to the desired responses associated with user input. For example, touching the face of the displayed avatar (user input) can produce a color change of the face of the displayed avatar (the desired response identified by the animation command). The animation commands can be recognized based on the recognized user input. For example, each user input may be related to (eg, be associated with) animation commands in a database with user input identifiers and animation commands.

操作414包括產生化身參數。化身參數包括被動組件且可包括互動組件。若未偵測到使用者輸入，則化身參數可包括被動組件。若偵測到使用者輸入，則化身參數是否可包括互動組件取決於動畫命令並因此取決於使用者輸入。對於相應於組配來使使用者之所選擇化身成動畫的動畫命令之使用者輸入而言，動畫命令可與僅包括被動組件之化身參數一起傳輸或可在傳輸之前應用於化身參數，以便所傳輸之化身參數包括被動組件及互動組件。對於相應於組配來使顯示於遠程使用者之裝置上的遠程使用者之回饋化身成動畫的動畫命令之輸入而言，可僅傳輸動畫命令。 Operation 414 includes generating avatar parameters. The avatar parameters include passive components and may include interactive components. If no user input is detected, the avatar parameters may include passive components. If user input is detected, whether the avatar parameters can include interactive components depends on the animation command and therefore on the user input. For user input corresponding to an animation command configured to animate the selected avatar of the user, the animation command may be transmitted together with the avatar parameters including only passive components or may be applied to the avatar parameters before transmission, so that all The transmitted avatar parameters include passive components and interactive components. For the input of the animation command corresponding to the configuration to make the feedback of the remote user displayed on the remote user's device into an animation, only the animation command may be transmitted.

操作416包括轉換及編碼所擷取音訊。所擷取音訊可轉化成音訊信號(例如使用者語音信號)。使用者語音信號可根據操作402之化身選擇的音訊轉換部分來轉換。經轉換之使用者語音信號相應於化身語音信號。化身語音信號可使用已知用於經由網路傳輸至遠程裝置及/或虛擬空間的技術來編碼。可在操作418處傳輸經轉換及編碼之音訊。操作418可進一步包括傳輸動畫命令及化身參數中之至少一者。傳輸動畫命令係組配來允許遠程裝置藉由根據動畫命令修改化身參數而使本地所顯示化身成動畫。已在傳輸之前根據動畫命令修改的經傳輸化身參數可直接用來使顯示於遠程裝置上的化身成動畫。換言之，由動畫命令表示的對化身參數之修改可在本地執行或遠程執行。 Operation 416 includes converting and encoding the captured audio. The captured audio can be converted into audio signals (such as user voice signals). The user's voice signal can be converted according to the audio conversion portion selected by the avatar of operation 402. The converted user voice signal corresponds to the avatar voice signal. The avatar voice signal can be encoded using techniques known for transmission to remote devices and/or virtual spaces via the network. The converted and encoded audio may be transmitted at operation 418. Operation 418 may further include transmitting at least one of animation commands and avatar parameters. The transmission animation command is configured to allow the remote device to animate the locally displayed avatar by modifying the avatar parameters according to the animation command. The transmitted avatar parameters that have been modified according to the animation commands before transmission can be directly used to animate the avatar displayed on the remote device. In other words, the modification of the avatar parameters represented by the animation command can be performed locally or remotely.

操作420包括接收可為經轉換音訊之遠程編碼音訊。操作420進一步包括接收遠程動畫命令及遠程化身參數中之至少一者。遠程動畫命令可用來修改相應於遠程使用者之所顯示化身或本地使用者之所顯示回饋化身的化身參數。動畫命令及化身參數係組配來產生基於使用者輸入加以修改的化身動畫。在操作422處，所接收之音訊可獲解碼及播放，且在操作424處，化身可獲顯示及成動畫。 Operation 420 includes receiving remotely encoded audio that may be converted audio. Operation 420 further includes receiving at least one of a remote animation command and remote avatar parameters. The remote animation command can be used to modify the avatar parameters corresponding to the displayed avatar of the remote user or the displayed feedback avatar of the local user. Animation commands and avatar parameters are combined to generate avatar animations that are modified based on user input. At operation 422, the received audio can be decoded and played, and at operation 424, the avatar can be displayed and animated.

所顯示化身之動畫可基於所偵測及識別之使用者輸入，如本文所述。在裝置至裝置通訊(例如系統100) 之示例中，遠程化身選擇或遠程化身參數中至少一者可接收自遠程裝置。相應於遠程使用者之化身可隨後基於所接收之遠程化身選擇來顯示，且可基於所接收之遠程化身參數而成動畫。在虛擬位置交互作用(例如系統126)之示例中，可接收允許裝置顯示相應於裝置使用者之化身所看見的內容的資訊。 The animation of the displayed avatar may be based on detected and recognized user input, as described herein. In an example of device-to-device communication (eg, system 100), at least one of remote avatar selection or remote avatar parameters may be received from the remote device. The avatar corresponding to the remote user can then be displayed based on the received remote avatar selection, and can be animated based on the received remote avatar parameters. In an example of virtual location interaction (eg, system 126), information may be received that allows the device to display content corresponding to what the device user's avatar sees.

可在操作426處判定通訊是否完成。若通訊完成，即可在操作428處結束程式流。若通訊未完成，程式流即可繼續進行至操作406，擷取影像、深度及/或音訊。 It can be determined whether the communication is completed at operation 426. If the communication is completed, the program flow can be ended at operation 428. If the communication is not completed, the program flow can proceed to operation 406 to capture images, depth and/or audio.

雖然圖4例示根據一實施例之各種操作，但是要理解的是，並非圖4中描繪的所有操作皆為其他實施例所必需。事實上，本文完全涵蓋的是，本揭示案之其他實施例、圖4中描繪之操作及/或本文描述之其他操作均可以一方式組合，該組合方式並未明確展示於隨附圖式之任何圖式中，但仍完全與本揭示案一致。因此，針對並未確切展示於一圖式中的特徵及/或操作的請求項被視為屬於本揭示案之範疇及內容。 Although FIG. 4 illustrates various operations according to an embodiment, it is to be understood that not all operations depicted in FIG. 4 are necessary for other embodiments. In fact, this article fully covers that other embodiments of the present disclosure, the operations depicted in FIG. 4 and/or other operations described herein can be combined in one way, which is not explicitly shown in the accompanying drawings In any scheme, but still completely consistent with this disclosure. Therefore, requests for features and/or operations that are not exactly shown in a drawing are considered to be within the scope and content of this disclosure.

如本文中任何實施例所使用，「應用程式(app)」一詞可以代碼或指令體現，該等代碼或指令可在諸如主機處理器的可規劃電路或其他可規劃電路上執行。 As used in any of the embodiments herein, the term "app" can be embodied in code or instructions that can be executed on a programmable circuit such as a host processor or other programmable circuit.

如本文中任何實施例所使用，「模組」一詞可代表app、軟體、韌體及/或電路，其組配來執行上述操作中之任何操作。軟體可體現為套裝軟體、記錄於至少一非暫時性電腦可讀儲存媒體上之代碼、指令、指令集及/或資料。韌體可體現為硬編碼(例如非依電性)於記憶體裝置中的代碼、指令或指令集及/或資料。 As used in any of the embodiments herein, the term "module" may represent an app, software, firmware, and/or circuit, which is configured to perform any of the above operations. The software may be embodied as a set of software, codes, instructions, instruction sets and/or materials recorded on at least one non-transitory computer-readable storage medium. The firmware may be embodied as codes, instructions or instruction sets and/or data hard-coded (eg, non-electrically dependent) in the memory device.

如本文中任何實施例所使用，「電路」可包含例如單獨的或呈任何組合的硬連線電路；可規劃電路，諸如包含一或多個單獨指令處理核心之電腦處理器；狀態機電路及/或儲存藉由可規劃電路執行之指令的韌體。模組可共同地或單獨地體現為形成大型系統之部分的電路，例如積體電路(IC)、系統單晶片(SoC)、桌上型電腦、膝上型電腦、平板電腦、伺服器、智慧型電話等等。 As used in any of the embodiments herein, "circuits" may include, for example, hardwired circuits alone or in any combination; circuits may be planned, such as computer processors containing one or more separate instruction processing cores; state machine circuits and And/or firmware that stores commands executed by programmable circuits. Modules can be collectively or individually embodied as circuits that form part of a large system, such as integrated circuits (ICs), system on a chip (SoC), desktop computers, laptop computers, tablet computers, servers, smarts Phone and so on.

如此所描述之任何操作可實行於包括一或多個儲存媒體之系統中，該等儲存媒體上儲存有單獨的或呈組合的指令，在藉由一或多個處理器執行該等指令時，該等指令執行該等方法。在此，處理器可包括例如伺服器CPU、行動裝置CPU及/或其他可規劃電路。此外，本文描述之操作意欲可跨越多個實體裝置來分散，該等實體裝置諸如處在一個以上不同實體位置處的處理結構。儲存媒體可包括任何類型的有形媒體，例如，任何類型之碟片，包括硬碟、軟碟片、光碟、光碟片-唯讀記憶體(CD-ROM)、可重寫光碟片(CD-RW)及磁光碟；半導體裝置，諸如唯讀記憶體(ROM)、隨機存取記憶體(RAM)(諸如動態及靜態RAM)、可抹除可規劃唯讀記憶體(EPROM)、電氣可抹除可規劃唯讀記憶體(EEPROM)、快閃記憶體、固態碟片(SSD)、磁性或光學卡；或者適合於儲存電子指令的任何類型之媒體。其他實施例可實行為藉由可規劃控制裝置執行之軟體模組。儲存媒體可為非暫時性的。 Any of the operations described in this way can be implemented in a system that includes one or more storage media on which separate or combined instructions are stored. When the instructions are executed by one or more processors, These instructions perform these methods. Here, the processor may include, for example, a server CPU, a mobile device CPU, and/or other programmable circuits. In addition, the operations described herein are intended to be distributed across multiple physical devices, such as processing structures at more than one different physical location. Storage media can include any type of tangible media, for example, any type of disc, including hard disk, floppy disk, optical disc, optical disc-read only memory (CD-ROM), rewritable optical disc (CD-RW) ) And magneto-optical disks; semiconductor devices, such as read-only memory (ROM), random-access memory (RAM) (such as dynamic and static RAM), erasable and programmable read-only memory (EPROM), and electrically erasable Programmable read-only memory (EEPROM), flash memory, solid state disk (SSD), magnetic or optical cards; or any type of media suitable for storing electronic commands. Other embodiments may be implemented as software modules executed by programmable control devices. The storage medium may be non-transitory.

因此，本揭示案提供一種用於使化身交互地成動畫以替代活動影像來進行視訊通訊的方法及系統。與活動影像之發送相比，化身之使用減少要交換的資訊之量。該系統及方法進一步組配來藉由例如音調偏移及/或使所擷取音訊信號時間延長而將使用者語音轉換成化身語音。化身之互動動畫可基於所偵測之使用者輸入，包括觸摸及手勢。互動動畫係組配來修改基於面部偵測及追蹤判定之動畫。 Therefore, the present disclosure provides a method and system for interactively animating avatars instead of moving images for video communication. Compared with the sending of moving images, the use of avatars reduces the amount of information to be exchanged. The system and method are further configured to convert user speech into avatar speech by, for example, pitch shifting and/or extending the time of the captured audio signal. Interactive animations of avatars can be based on detected user input, including touch and gestures. Interactive animation is configured to modify animation based on face detection and tracking judgment.

根據一態樣，提供一種系統。該系統可包括：使用者輸入裝置，其組配來擷取使用者輸入；通訊模組，其組配來傳輸及接收資訊；以及一或多個儲存媒體。此外，該一或多個儲存媒體上儲存有單獨的或呈組合的指令，在藉由一或多個處理器執行該等指令時產生以下操作，包含：選擇化身；起始通訊；偵測使用者輸入；識別使用者輸入；基於使用者輸入識別動畫命令；產生化身參數；以及傳輸動畫命令及化身參數中之至少一者。 According to one aspect, a system is provided. The system may include: a user input device configured to capture user input; a communication module configured to transmit and receive information; and one or more storage media. In addition, individual or combined instructions are stored on the one or more storage media, and when the instructions are executed by one or more processors, the following operations are generated, including: selecting an avatar; initiating communication; detecting usage Input; recognize user input; recognize animation commands based on user input; generate avatar parameters; and transmit at least one of animation commands and avatar parameters.

另一示例系統包括前述組件且進一步包括：麥克風，其組配來擷取聲音且將所擷取之聲音轉化成相應音訊信號；以及指令，當藉由一或多個處理器執行該等指令時產生以下額外操作：擷取使用者語音且將使用者語音轉化成相應使用者語音信號；將使用者語音信號轉換成化身語音信號；以及傳輸化身語音信號。 Another example system includes the aforementioned components and further includes: a microphone configured to capture sound and convert the captured sound into a corresponding audio signal; and instructions when the instructions are executed by one or more processors The following additional operations are generated: capturing user voice and converting user voice into corresponding user voice signals; converting user voice signals into avatar voice signals; and transmitting avatar voice signals.

另一示例系統包括前述組件且進一步包括：攝影機，其組配來擷取影像；以及指令，當藉由一或多個處理器執行該等指令時產生以下額外操作：擷取影像；偵測影像中的臉部；自臉部提取特徵；以及將特徵轉化成化身參數。 Another example system includes the aforementioned components and further includes: a camera configured to capture images; and instructions that, when executed by one or more processors, generate the following additional operations: capture images; detect images Face in; extracting features from the face; and transforming features into avatar parameters.

另一示例系統包括前述組件且進一步包括：顯示器；以及指令，當藉由一或多個處理器執行該等指令時產生以下額外操作：顯示至少一化身；接收遠程動畫命令及遠程化身參數中之至少一者；以及基於遠程動畫命令及遠程化身參數中之至少一者使一所顯示化身成動畫。 Another example system includes the aforementioned components and further includes: a display; and instructions that, when executed by one or more processors, generate the following additional operations: displaying at least one avatar; receiving remote animation commands and remote avatar parameters At least one; and animate a displayed avatar based on at least one of remote animation commands and remote avatar parameters.

另一示例系統包括前述組件且進一步包括：揚聲器，其組配來將音訊信號轉換成聲音；以及指令，當藉由一或多個處理器執行該等指令時產生以下額外操作：接收遠程化身語音信號；以及將遠程化身語音信號轉化成化身語音。 Another example system includes the aforementioned components and further includes: a speaker configured to convert the audio signal into sound; and instructions that generate the following additional operations when executed by one or more processors: receiving remote avatar voice Signal; and convert the remote avatar voice signal into avatar voice.

另一示例系統包括前述組件，且該使用者輸入裝置為組配來擷取距離之攝影機且該使用者輸入為手勢。 Another example system includes the aforementioned components, and the user input device is a camera configured to capture a distance and the user input is a gesture.

另一示例系統包括前述組件，且該使用者輸入裝置為觸摸感應顯示器且該使用者輸入為觸摸事件。 Another example system includes the aforementioned components, and the user input device is a touch-sensitive display and the user input is a touch event.

另一示例系統包括前述組件，且該轉換包含音調偏移及時間延長中之至少一者。 Another example system includes the aforementioned components, and the conversion includes at least one of pitch offset and time extension.

根據另一態樣，提供一種方法。該方法可包括選擇化身；起始通訊；偵測使用者輸入；識別使用者輸入；基於使用者輸入識別動畫命令；基於動畫命令產生化身參數；及傳輸動畫命令及化身參數中之至少一者。 According to another aspect, a method is provided. The method may include selecting an avatar; initiating communication; detecting user input; identifying user input; identifying animation commands based on user input; generating avatar parameters based on the animation commands; and transmitting at least one of animation commands and avatar parameters.

另一示例方法包括前述操作且進一步包括：擷取使用者語音且將使用者語音轉化成相應使用者語音信號；將使用者語音信號轉換成化身語音信號；以及傳輸化身語音信號。 Another example method includes the foregoing operations and further includes: capturing user voice and converting the user voice into a corresponding user voice signal; converting the user voice signal into an avatar voice signal; and transmitting the avatar voice signal.

另一示例方法包括前述操作且進一步包括：擷取影像；偵測影像中的臉部；自臉部提取特徵；以及將特徵轉化成化身參數。 Another example method includes the aforementioned operations and further includes: capturing an image; detecting a face in the image; extracting features from the face; and converting the features into avatar parameters.

另一示例方法包括前述操作且進一步包括：顯示至少一化身；接收遠程動畫命令及遠程化身參數中之至少一者；以及基於遠程動畫命令及遠程化身參數中之至少一者使一所顯示化身成動畫。 Another example method includes the foregoing operations and further includes: displaying at least one avatar; receiving at least one of a remote animation command and a remote avatar parameter; and making a displayed avatar based on at least one of the remote animation command and the remote avatar parameter Animation.

另一示例方法包括前述操作且進一步包括：接收遠程化身語音信號；以及將遠程化身語音信號轉化成化身語音。 Another example method includes the foregoing operations and further includes: receiving a remote avatar voice signal; and converting the remote avatar voice signal into avatar voice.

另一示例方法包括前述操作且該使用者輸入為手勢。 Another example method includes the aforementioned operation and the user input is a gesture.

另一示例方法包括前述操作且該使用者輸入為觸摸事件。 Another example method includes the aforementioned operation and the user input is a touch event.

另一示例方法包括前述操作且該轉換包含音調偏移及時間延長中之至少一者。根據另一態樣，提供一種系統。該系統可包括一或多個儲存媒體，該一或多個儲存媒體上儲存有單獨的或呈組合的指令，在藉由一或多個處理器執行該等指令時產生以下操作，包括選擇化身；起始通訊；偵測使用者輸入；識別使用者輸入；基於使用者輸入識別動畫命令；產生化身參數；以及傳輸動畫命令及化身參數中之至少一者。 Another example method includes the foregoing operation and the conversion includes at least one of pitch offset and time extension. According to another aspect, a system is provided. The system may include one or more storage media with individual or combined instructions stored on them, which when executed by one or more processors produce the following operations, including the selection of avatars ; Initiating communication; Detecting user input; Recognizing user input; Recognizing animation commands based on user input; Generating avatar parameters; and transmitting at least one of animation commands and avatar parameters.

另一示例系統包括指令，當藉由一或多個處理器執行該等指令時產生前述操作，並且亦包括：擷取使用者語音且將使用者語音轉化成相應使用者語音信號；將使用者語音信號轉換成化身語音信號；以及傳輸化身語音信號。 Another example system includes instructions that generate the aforementioned operations when executed by one or more processors, and also includes: capturing user voice and converting the user voice into a corresponding user voice signal; converting the user Convert voice signals into avatar voice signals; and transmit avatar voice signals.

另一示例系統包括指令，當藉由一或多個處理器執行該等指令時產生前述操作，並且亦包括：擷取影像；偵測影像中的臉部；自臉部提取特徵；以及將特徵轉化成化身參數。 Another example system includes instructions that generate the aforementioned operations when executed by one or more processors, and also includes: capturing an image; detecting a face in the image; extracting features from the face; and integrating features Transform into avatar parameters.

另一示例系統包括指令，當藉由一或多個處理器執行該等指令時產生前述操作，並且亦包括：顯示至少一化身；接收遠程動畫命令及遠程化身參數中之至少一者；以及基於遠程動畫命令及遠程化身參數中之至少一者使一所顯示化身成動畫。 Another example system includes instructions that generate the aforementioned operations when executed by one or more processors, and also includes: displaying at least one avatar; receiving at least one of remote animation commands and remote avatar parameters; and based on At least one of the remote animation command and the remote avatar parameters animate a displayed avatar.

另一示例系統包括指令，當藉由一或多個處理器執行該等指令時產生前述操作，並且亦包括：接收遠程化身語音信號；以及將遠程化身語音信號轉化成化身語音。 Another example system includes instructions that generate the aforementioned operations when executed by one or more processors, and also includes: receiving a remote avatar voice signal; and converting the remote avatar voice signal into avatar voice.

另一示例系統包括指令，當藉由一或多個處理器執行該等指令時產生前述操作，並且該使用者輸入為手勢。 Another example system includes instructions that are generated when the instructions are executed by one or more processors, and the user input is a gesture.

另一示例系統包括指令，當藉由一或多個處理器執行該等指令時產生前述操作，並且該使用者輸入為觸摸事件。 Another example system includes instructions that generate the aforementioned operations when executed by one or more processors, and the user input is a touch event.

另一示例系統包括指令，當藉由一或多個處理器執行該等指令時產生前述操作，並且該轉換包含音調偏移及時間延長中之至少一者。 Another example system includes instructions that generate the aforementioned operations when executed by one or more processors, and the conversion includes at least one of pitch offset and time extension.

本文已使用之用詞及表述係用作描述之用詞且並非限制，且在使用此等用詞及表述時，不欲排除所展示及所描述的特徵之任何等效物(或其部分)，且應認識到，在申請專利範圍之範疇內，可能存在各種修改。因此，申請專利範圍意欲涵蓋所有此類等效物。 The terms and expressions used herein are used for description and are not limiting, and in using these terms and expressions, it is not intended to exclude any equivalents (or parts thereof) of the features shown and described And, it should be recognized that within the scope of patent application, there may be various modifications. Therefore, the scope of patent application is intended to cover all such equivalents.

102、112‧‧‧裝置/遠程裝置 102、112‧‧‧device/remote device

104、114‧‧‧攝影機 104、114‧‧‧Camera

106、116‧‧‧麥克風 106, 116‧‧‧ microphone

107、117‧‧‧揚聲器 107, 117‧‧‧ speaker

110、120‧‧‧化身 110、120‧‧‧avatar

122‧‧‧網路 122‧‧‧ Internet

124‧‧‧伺服器 124‧‧‧Server

Claims

One or more non-transitory computer-readable storage devices having instructions stored thereon, when the instructions are executed by at least one processor of a first computing device, results in operations including the following: enabling a first avatar The selection of one of the first computing device; recognize one or more facial features of a user of the first computing device; generate information to be transmitted to a second computing device to cause the selected first avatar to be animatedly displayed on On a display of the second computing device, wherein the information is based on the identified facial feature(s) of the user of the first computing device; based on a user input command to enable the selected The animation of the first avatar, wherein the user input command is different from the one or more facial features, and when controlled by the user of the first computing device, the user input command is to be used by a user Input device to generate; and convert the voice information of the user of the first computing device into target voice information to be transmitted to the second computing device; wherein the conversion uses one or more sound effects to distort the The voice information of the user of the first computing device.

One or more storage devices according to claim 1, wherein the one or more facial features are to be recognized from one or more video images of the user of the first computing device.

One or more storage devices according to claim 1, wherein when the instructions are executed by the at least one processor of the first computing device, further operations including the following are performed: processing is to be transferred to the second computing device The audio information of the user of the first computing device.

One or more storage devices according to claim 1, wherein when the instructions are executed by the at least one processor of the first computing device, further operations including the following are performed: enabling selection of a second avatar; generating The second information to be transmitted to the second computing device to cause the selected second avatar to be animatedly displayed on the display of the second computing device, wherein the second information is based on the recognition The one or more facial features of the user of the first computing device; and causing the display of the selected second avatar on the first computing device to enable the user of the first computing device to Observe the appearance of one of the selected second avatars on the second computing device.

One or more storage devices according to claim 1, wherein the one or more sound effects include a pitch shift sound effect.

One or more storage devices according to claim 1, wherein when the instructions are executed by the at least one processor of the first computing device, further operations including the following are performed: causing selection on the first computing device The display of the first avatar enables the user of the first computing device to observe the The appearance of one of the selected first avatars on the second computing device.

A first computing device includes: a memory circuit for storing instructions and data; a display for displaying an avatar; and a processor circuit for processing one or more instructions to perform operations including: enabling one Selection of the first avatar; identifying one or more facial features of a user of one of the first computing devices; generating information to be transmitted to a second computing device to cause the selected first avatar to animate Is displayed on a display of the first computing device, wherein the information is based on the identified facial feature(s) of the user of the first computing device; enabled based on a user input command The selected animation of the first avatar; wherein the user input command is different from the one or more facial features, and when controlled by the user of the first computing device, the user input command Generated by a user input device; and converting the user's voice information of the first computing device into target voice information to be transmitted to the second computing device; wherein, the conversion uses one or more sound effects to Distorting the voice information of the user of the first computing device.

The first computing device of claim 7 further includes: one or more views of the user of the first computing device A video recording device for video images; wherein the one or more facial features are recognized from the one or more video images of the user of the first computing device that was captured.

The first computing device of claim 7 further includes an audio capturing device that can capture audio information of the user of the first computing device to be transmitted to the second computing device.

The first computing device of claim 7, wherein the processor is configured to process one or more instructions to perform further operations including: enabling selection of a second avatar; generating a to be transmitted to the second computing device Second information for causing the selected second avatar to be animatedly displayed on the display of the first computing device, wherein the second information is based on the identified user of the first computing device The one or more facial features; and causes the display of the selected second avatar on the first computing device to enable the user of the first computing device to observe the experience on the first computing device The appearance of one of the selected second avatars.

The first computing device of claim 7, wherein the one or more sound effects include a pitch-shifting sound effect.

One or more storage devices as in claim 7, wherein the processor is to process one or more instructions to perform further operations including the following: causing the display of the selected first avatar on the first computing device to cause The user of the first computing device can observe the appearance of one of the selected first avatars on the first computing device.

A method for communication includes: using a first computing device to enable selection of a first avatar; using the first computing device to identify one or more facial features of a user of the first computing device ; Using the first computing device to generate information to be transmitted to a second computing device to cause the selected first avatar to be animatedly displayed on a display of the second computing device, wherein the information Based on the one or more facial features of the user of the identified first computing device; using the first computing device to enable animation of the selected first avatar based on a user input command, Wherein the user input command is different from the one or more facial features, and when controlled by the user of the first computing device, the user input command is generated by a user input device; and The first computing device converts the user's voice information of the first computing device into target voice information to be transmitted to the second computing device; wherein, the conversion uses one or more sound effects to distort The voice information of the user of the first computing device.

The method of claim 13, wherein the one or more facial features are recognized from one or more video images of the user of the first computing device.

The method of claim 13, further comprising: processing, by the first computing device, audio information of the user of the first computing device to be transmitted to the second computing device.

The method of claim 13, further comprising: enabling selection of a second avatar by the first computing device; generating a second to be sent to the second computing device by the first computing device Information for causing the selected second avatar to be animatedly displayed on the display of the second computing device, wherein the second information is based on the identified user of the user of the first computing device One or more facial features; and the selected second avatar is displayed on the first computing device by the first computing device to enable the user of the first computing device to observe on the second computing device The appearance of one of the selected second avatars.

The sound is the method of claim 13, wherein the one or more sound effects include a pitch shift sound effect.

The method of claim 13, further comprising: displaying, by the first computing device, the selected first avatar on the first computing device to enable the user of the first computing device to observe the second The appearance of one of the selected first avatars on the computing device.

A first computing device includes: an avatar selection module for enabling selection of a first avatar; a feature extraction module for identifying one or more users of a user of the first computing device Facial features; an avatar control module used to: generate information to be sent to a second computing device for Causing the selected first avatar to be animatedly displayed on a display of the first computing device, wherein the information is based on the identified one or more facial features of the user of the first computing device ; And based on a user input command to enable the selected animation of the first avatar, wherein the user input command is different from the one or more facial features, and when the first computing device by the During user control, the user input command is generated by a user input device, and an audio conversion module is used to: convert the user's voice information of the first computing device into target voice information to be Sent to the second computing device; wherein, the conversion uses one or more sound effects to distort the voice information of the user of the first computing device.

The first computing device of claim 19 further includes a face detection and tracking module for detecting and tracking a face of the user of the first computing device.

The first computing device of claim 19 further includes an audio extraction module for capturing audio information of the user of the first computing device to be transmitted to the second computing device.

The first computing device of claim 19, wherein the avatar selection module can be further used to enable selection of a second avatar.

The first computing device of claim 22, wherein the avatar control module is further used to: Generating second information to be transmitted to the second computing device for causing the selected second avatar to be animatedly displayed on the display of the first computing device, wherein the second information is based on being recognized The one or more facial features of the user of the first computing device.

The first computing device of claim 23 further includes a display module for displaying the selected second avatar on the first computing device, so that the user of the first computing device can observe An appearance of the selected second avatar on the first computing device.

The first computing device of claim 19, wherein the one or more sound effects include a pitch shift sound effect.

The first computing device of claim 19 further includes a display module to display the selected first avatar on the first computing device to enable the user of the first computing device to observe the first computing device The appearance of one of the first avatars selected on the device.

The first computing device of claim 19 further includes a video recording device for capturing one or more video images of the user of the first computing device, wherein the one or more facial features are Recognized from the one or more video images of the user of the first computing device that was captured.