US20180336716A1 - Voice effects based on facial expressions - Google Patents

Voice effects based on facial expressions Download PDF

Info

Publication number
US20180336716A1
US20180336716A1 US15/908,603 US201815908603A US2018336716A1 US 20180336716 A1 US20180336716 A1 US 20180336716A1 US 201815908603 A US201815908603 A US 201815908603A US 2018336716 A1 US2018336716 A1 US 2018336716A1
Authority
US
United States
Prior art keywords
audio
avatar
video
audio signal
virtual avatar
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US15/908,603
Inventor
Sean A. Ramprashad
Carlos M. Avendano
Aram M. Lindahl
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Apple Inc
Original Assignee
Apple Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Apple Inc filed Critical Apple Inc
Priority to US15/908,603 priority Critical patent/US20180336716A1/en
Assigned to APPLE INC. reassignment APPLE INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: AVENDANO, CARLOS M., LINDAHL, ARAM M., RAMPRASHAD, SEAN A.
Priority to US16/033,111 priority patent/US10861210B2/en
Publication of US20180336716A1 publication Critical patent/US20180336716A1/en
Priority to KR1020207022657A priority patent/KR102367143B1/en
Priority to CN201980016107.6A priority patent/CN111787986A/en
Priority to DE112019001058.1T priority patent/DE112019001058T5/en
Priority to PCT/US2019/019554 priority patent/WO2019168834A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T13/00Animation
    • G06T13/203D [Three Dimensional] animation
    • G06T13/403D [Three Dimensional] animation of characters, e.g. humans, animals or virtual beings
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T13/00Animation
    • G06T13/802D [Two Dimensional] animation, e.g. using sprites
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/03Arrangements for converting the position or the displacement of a member into a coded form
    • G06F3/0304Detection arrangements using opto-electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0484Interaction techniques based on graphical user interfaces [GUI] for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range
    • G06F3/04842Selection of displayed objects or displayed text elements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • G06F3/012Head tracking input arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0484Interaction techniques based on graphical user interfaces [GUI] for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/167Audio in a user interface, e.g. using voice commands for navigating, audio feedback
    • G06K9/00315
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/20Scenes; Scene-specific elements in augmented reality scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • G06V40/175Static expression
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • G06V40/176Dynamic expression
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/07User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail characterised by the inclusion of specific contents
    • H04L51/08Annexed information, e.g. attachments
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/07User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail characterised by the inclusion of specific contents
    • H04L51/10Multimedia information
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/52User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail for supporting social networking services
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/72Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
    • H04M1/724User interfaces specially adapted for cordless or mobile telephones
    • H04M1/72403User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality
    • H04M1/7243User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality with interactive means for internal management of messages
    • H04M1/72436User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality with interactive means for internal management of messages for text messaging, e.g. SMS or e-mail
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0487Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser
    • G06F3/0488Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser using a touch-screen or digitiser, e.g. input of commands through traced gestures
    • G06F3/04886Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser using a touch-screen or digitiser, e.g. input of commands through traced gestures by partitioning the display area of the touch-screen or the surface of the digitising tablet into independently controllable areas, e.g. virtual keyboards or menus
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/04Real-time or near real-time messaging, e.g. instant messaging [IM]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/58Message adaptation for wireless communication
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/72Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
    • H04M1/724User interfaces specially adapted for cordless or mobile telephones
    • H04M1/72403User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality
    • H04M1/7243User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality with interactive means for internal management of messages
    • H04M1/72439User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality with interactive means for internal management of messages for image or video messaging
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M2250/00Details of telephonic subscriber devices
    • H04M2250/52Details of telephonic subscriber devices including functional features of a camera
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/61Control of cameras or camera modules based on recognised objects
    • H04N23/611Control of cameras or camera modules based on recognised objects where the recognised objects include parts of the human body
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/63Control of cameras or camera modules by using electronic viewfinders
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/12Messaging; Mailboxes; Announcements

Definitions

  • Multimedia content such as emoji's
  • the emoji's can represent a variety of predefined people, objects, actions, and/or other things.
  • Some messaging applications allow users to select from a predefined library of emoji's which can be sent as part of a message that can contain other content (e.g., other multimedia and/or textual content).
  • Animojis are one type of this other multimedia content, where a user can select an avatar (e.g., a puppet) to represent themselves.
  • the animoji can move and talk as if it were a video of the user.
  • Animojis enable users to create personalized versions of emoji's in a fun and creative way.
  • Embodiments of the present disclosure can provide systems, methods, and computer-readable medium for implementing avatar video clip revision and playback techniques.
  • a computing device can present a user interface (UI) for tracking a user's face and presenting a virtual avatar representation (e.g., a puppet or video character version of the user's face).
  • UI user interface
  • the computing device can capture audio and video information, extract and detect context as well as facial feature characteristics and voice feature characteristics, revise the audio and/or video information based at least in part on the extracted/identified features, and present a video clip of the avatar using the revised audio and/or video information.
  • a computer-implemented method for implementing various audio and video effects techniques may be provided.
  • the method may include displaying a virtual avatar generation interface.
  • the method may also include displaying first preview content of a virtual avatar in the virtual avatar generation interface, the first preview content of the virtual avatar corresponding to realtime preview video frames of a user headshot in a field of view of the camera and associated headshot changes in an appearance.
  • the method may also include detecting an input in the virtual avatar generation interface while displaying the first preview content of the virtual avatar.
  • the method in response to detecting the input in the virtual avatar generation interface, may also include: capturing, via the camera, a video signal associated with the user headshot during a recording session, capturing, via the microphone, a user audio signal during the recording session, extracting audio feature characteristics from the captured user audio signal, and extracting facial feature characteristics associated with the face from the captured video signal. Additionally, in response to detecting expiration of the recording session, the method may also include: generating an adjusted audio signal from the captured audio signal based at least in part on the facial feature characteristics and the audio feature characteristics, generating second preview content of the virtual avatar in the virtual avatar generation interface according to the facial feature characteristics and the adjusted audio signal, and presenting the second preview content in the virtual avatar generation interface.
  • the method may also include storing facial feature metadata associated with the facial feature characteristics extracted from the video signal and generating adjusted facial feature metadata from the facial feature metadata based at least in part on the facial feature characteristics and the audio feature characteristics. Additionally, the second preview of the virtual avatar may be displayed further according to the adjusted facial metadata. In some examples, the first preview of the virtual avatar may be displayed according to preview facial feature characteristics identified according to the changes in the appearance of the face during a preview session.
  • an electronic device for implementing various audio and video effects techniques may be provided.
  • the system may include a camera, a microphone, a library of pre-recorded/pre-determined audio, and one or more processors in communication with the camera and the microphone.
  • the processors may be configured to execute computer-executable instructions to perform operations.
  • the operations may include detecting an input in a virtual avatar generation interface while displaying a first preview of a virtual avatar.
  • the operations may also include initiating a capture session including in response to detecting the input in the virtual avatar generation interface.
  • the capture session may include: capturing, via the camera, a video signal associated with a face in a field of view of the camera, capturing, via the microphone, an audio signal associated with the captured video signal, extracting audio feature characteristics from the captured audio signal, and extracting facial feature characteristics associated with the face from the captured video signal.
  • the operations may also include generating an adjusted audio signal based at least in part on the audio feature characteristics and the facial feature characteristics and presenting the second preview content in the virtual avatar generation interface, at least in response to detecting expiration of the capture session.
  • the audio signal may be further adjusted based at least in part on a type of the virtual avatar. Additionally, the type of the virtual avatar may be received based at least in part on an avatar type selection affordance presented in the virtual avatar generation interface. In some instances, the type of the virtual avatar may include an animal type, and the adjusted audio signal may be generated based at least in part on a predetermined sound associated with the animal type. The use and timing of predetermined sounds may be based on audio features from the captured audio and/or facial features from the captured video. This predetermined sound may also be itself modified based on audio features from the captured audio and facial features from the captured video. In some examples, the one or more processors may be further configured to determine whether a portion of the audio signal corresponds to the face in the field of view.
  • the portion of the audio signal may be stored for use in generating the adjusted audio signal and/or in accordance with a determination that the portion of the audio signal does not correspond to the face, at least the portion of the audio signal may be discarded and not considered for modification and/or playback.
  • the audio feature characteristics may comprise features of a voice associated with the face in the field of view.
  • the one or more processors may be further configured to store facial feature metadata associated with the facial feature characteristics extracted from the video signal.
  • the one or more processors may be further configured to store audio feature metadata associated with the audio feature characteristics extracted from the audio signal.
  • the one or more processors may be further configured to generate adjusted facial metadata based at least in part on the facial feature characteristics and the audio feature characteristics, and the second preview of the virtual avatar may be generated according to the adjusted facial metadata and the adjusted audio signal.
  • a computer-readable medium may be provided.
  • the computer-readable medium may include computer-executable instructions that, when executed by one or more processors, cause the one or more processors to perform operations.
  • the operations may include performing the following actions in response to detecting a request to generate an avatar video clip of a virtual avatar: capturing, via a camera of an electronic device, a video signal associated with a face in a field of view of the camera, capturing, via a microphone of the electronic device, an audio signal, extracting voice feature characteristics from the captured audio signal, and extracting facial feature characteristics associated with the face from the captured video signal.
  • the operations may also include performing the following actions in response to detecting a request to preview the avatar video clip: generating an adjusted audio signal based at least in part on the facial feature characteristics and the voice feature characteristics, and displaying a preview of the video clip of the virtual avatar using the adjusted audio signal.
  • the audio signal may be adjusted based at least in part on a facial expression identified in the facial feature characteristics associated with the face. In some instances, the audio signal may be adjusted based at least in part on a level, pitch, duration, format, or change in a voice characteristic associated with the face. Further, in some embodiments, the one or more processors may be further configured to perform the operations comprising transmitting the video clip of the virtual avatar to another electronic device.
  • FIG. 1 is a simplified block diagram illustrating example flow for providing audio and/or video effects techniques as described herein, according to at least one example.
  • FIG. 2 is another simplified block diagram illustrating example flow for providing audio and/or video effects techniques as described herein, according to at least one example.
  • FIG. 3 is another simplified block diagram illustrating hardware and software components for providing audio and/or video effects techniques as described herein, according to at least one example.
  • FIG. 4 is a flow diagram to illustrate providing audio and/or video effects techniques as described herein, according to at least one example.
  • FIG. 5 is another flow diagram to illustrate providing audio and/or video effects techniques as described herein, according to at least one example.
  • FIG. 6 is a simplified block diagram illustrating a user interface for providing audio and/or video effects techniques as described herein, according to at least one example.
  • FIG. 7 is another flow diagram to illustrate providing audio and/or video effects techniques as described herein, according to at least one example.
  • FIG. 8 is another flow diagram to illustrate providing audio and/or video effects techniques as described herein, according to at least one example.
  • FIG. 9 is a simplified block diagram illustrating is a computer architecture for providing audio and/or video effects techniques as described herein, according to at least one example.
  • Certain embodiments of the present disclosure relate to devices, computer-readable medium, and methods for implementing various techniques for providing voice effects (e.g., revised audio) based at least in part on facial expressions. Additionally, in some cases, the various techniques may also provide video effects based at least in part on audio characteristics of a recording. Even further, the various techniques may also provide voice effects and video effects (e.g., together) based at least in part on one or both of facial expressions and audio characteristics of a recording. In some examples, the voice effects and/or video effects may be presented in a user interface (UI) configured to display a cartoon representation of a user (e.g., an avatar or digital puppet). Such an avatar that represents a user may be considered an animoji, as it may look like an emoji character familiar to most smart phone users; however, it can be animated to mimic actual motions of the user.
  • UI user interface
  • a user of a computing device may be presented with a UI for generating an animoji video (e.g., a video clip).
  • the video clip can be limited to a predetermined amount of time (e.g., 10 second, 30 seconds, or the like), or the video clip can be unlimited.
  • a preview area may present the user with a real-time representation of their face, using an avatar character.
  • avatar characters may be provided, and a user may even be able to generate or import their own avatars.
  • the preview area may be configured to provide an initial preview of the avatar and a preview of the recorded video clip.
  • the recorded video clip may be previewed in its original form (e.g., without any video or audio effects) or it may be previewed with audio and/or video effects.
  • the user may select an avatar after the initial video clip has been recorded.
  • the video clip preview may then change from one avatar to another, with the same or different video effects applied to it, as appropriate.
  • the raw preview e.g., original form, without effects
  • the UI may be updated to display a rendering of the same video clip but with the newly selected avatar.
  • the facial features and audio e.g., the user's voice
  • the preview it will appear as if the avatar character is moving the same way the user moved during the recording, and speaking what the user said during the recording.
  • a user may select a first avatar (e.g., a unicorn head) via the UI, or a default avatar can be initially provided.
  • the UI will present the avatar (in this example, the head of a cartoon unicorn if selected by the user or any other available puppet by default) in the preview area, and the device will begin capturing audio and/or video information (e.g., using one or more microphones and/or one or more cameras).
  • audio and/or video information e.g., using one or more microphones and/or one or more cameras.
  • only video information is needed for the initial preview screen.
  • the video information can be analyzed, and facial features can be extracted. These extracted facial features can then be mapped to the unicorn face in real-time, such that the initial preview of the unicorn head appears to mirror that of the user's.
  • the term real-time is used to indicate that the results of the extraction, mapping, rendering, and presentation are performed in response to each motion of the user and can be presented substantially immediately. To the user, it will appear as if they are looking in the mirror, except the image of their face is replaced with an avatar.
  • the UI While the user's face is in the line of sight (e.g., the view) of a camera of the device, the UI will continue to present the initial preview.
  • the device may begin to capture video that has an audio component. In some examples, this includes a camera capturing frames and a microphone capturing audio information.
  • a special camera may be utilized that is capable of capturing 3-dimensional ( 3 D) information as well.
  • any camera may be utilized that is capable of capturing video.
  • the video may be stored in its original form and/or metadata associated with the video may be stored. As such, capturing the video and/or audio information may be different from storing the information.
  • capturing the information may include sensing the information and at least caching it such that is available for processing.
  • the processed data can also be cached until it is determined whether to store or simply utilize the data.
  • the video data e.g., metadata associated with the data
  • this data may not be stored permanently at all, such that the initial preview is not reusable or recoverable.
  • the video data and the audio data may be stored more permanently.
  • the audio and video (A/V) data may be analyzed, processed, etc., in order to provide the audio and video effects described herein.
  • the video data may be processed to extract facial features (e.g., facial feature characteristics) and those facial features may be stored as metadata for the animoji video clip.
  • the set of metadata may be stored with an identifier (ID) that indicates the time, date, and user associated with the video clip.
  • ID identifier
  • the audio data may be stored with the same or other ID.
  • the system may extract audio feature characteristics from the audio data and facial feature characteristics from the video file. This information can be utilized to identify context, key words, intent, and/or emotions of the user, and video and audio effects can be introduced into audio and video data prior to rendering the puppet.
  • the audio signal can be adjusted to include different words, sounds, tones, pitches, timing, etc., based at least in part on the extracted features.
  • the video data e.g., the metadata
  • audio features are extracted in real-time during the preview itself. These audio features may be avatar specific, generated only if the associated avatar is being previewed.
  • the audio features may be avatar agnostic, generated for all avatars.
  • the audio signal can also be adjusted in part based on these real-time audio feature extractions, and with the pre-stored extracted video features which are created during or after the recording process, but before previewing.
  • a second preview of the puppet can be rendered. This rendering may be performed for each possible puppet, such as the user scrolls through and selects different puppets, the adjusted data is already rendered. Or the rendering can be performed after selection of each puppet. In any event, once the user selects a puppet, the second preview can be presented. The second preview will replay the video clip that was recorded by the user, but with the adjusted audio and/or video. Using the example from above, if the user recorded themselves with an angry tone (e.g., with a gruff voice and a furrowed brow), the context or intent of anger may be detected, and the audio file may be adjusted to include a growling sound.
  • an angry tone e.g., with a gruff voice and a furrowed brow
  • the context or intent of anger may be detected, and the audio file may be adjusted to include a growling sound.
  • the second preview would look like a unicorn saying the words that the user said; however, the voice of the user may be adjusted to sound like a growl, or to make the tone more baritone (e.g., lower).
  • the user could then save the second preview or select it for transmission to another user (e.g., through a messaging application or the like).
  • the below and above animoji video clips can be shared as .mov files.
  • the described techniques can be used in real-time (e.g., with video messaging or the like).
  • FIG. 1 is a simplified block diagram illustrating example flow 100 for providing audio and/or video effects based at least in part on audio and/or video features detected in a user's recording.
  • recording session 102 there are two separate sessions: recording session 102 and playback session 104 .
  • device 106 may capture video having an audio component of user 108 at block 110 .
  • the video and audio may be captured (e.g., collected) separately, using two different devices (e.g., a microphone and a camera).
  • the capturing of video and audio may be triggered based at least in part on selection of a record affordance by user 108 .
  • user 108 may say the word “hello” at block 112 .
  • device 106 may continue to capture the video and/or audio components of the user's actions.
  • device 106 can continue capturing the video and audio components, and in this example, user 108 may say the word “bark.”
  • device 106 may also extract spoken words from the audio information.
  • the spoken word extraction (or any audio feature extraction) may actually take place after recording session 102 is complete.
  • the spoken word extraction (or any audio feature extraction) may actually take place during the preview block 124 in real-time. It is also possible for the extraction (e.g., analysis of the audio) to be done in real-time while recording session 102 is still in process. In either case, the avatar process being executed by device 106 may identify through the extraction that the user said the word “bark” and may employ some logic to determine what audio effects to implement.
  • recording session 102 may end when user 108 selects the record affordance again (e.g., indicating a desire to end the recording), selects an end recording affordance (e.g., the record affordance may act as an end recording affordance while recording), or based at least in part on expiration of a time period (e.g., 10 seconds, 30 seconds, or the like). In some cases, this time period may be automatically predetermined, while in others, it may be user selected (e.g., selected from a list of options or entered in free form through a text entry interface).
  • user 108 may select a preview affordance, indicating that user 108 wishes to watch a preview of the recording.
  • One option could be to play the original recording without any visual or audio effects. However, another option could be to play a revised version of the video clip. Based at least in part on detection of the spoken word “bark,” the avatar process may have revised the audio and/or video of the video clip.
  • device 106 may present avatar (also called a puppet and/or animoji) 118 on a screen.
  • Device 106 may also be configured with speaker 120 that can play audio associated with the video clip.
  • block 116 corresponds to the same point in time as block 110 , where user 108 may have had his mouth open, but was not yet speaking.
  • avatar 118 may be presented with his mouth open; however, no audio is presented from speaker 120 yet.
  • the avatar process can present avatar 118 with an avatar-specific voice. In other words, a predefined dog voice may be used to say the word “hello” at block 122 .
  • the dog-voice word “hello” can be presented by speaker 120 .
  • each avatar may be associated with a particular pre-defined voice that best fits that avatar.
  • a dog may have a dog voice
  • a cat may have a cat voice
  • a pig may have a pig voice
  • a robot may have a robotic voice.
  • These avatar-specific voices may be pre-recorded or may be associated with particular frequency or audio transformations, that can happen by executing mathematical operations on the original sound, such that any user's voice can be transformed to sound like the dog voice.
  • each user's dog voice may sound different based at least in part on the particular audio transformation performed.
  • the avatar process may replace the spoken word (e.g., “bark”) with an avatar-specific word.
  • the sound of a dog bark e.g., a recorded or simulated dog bark
  • the audio data e.g., in place of the word “bark”
  • different avatar-specific words will be presented at 124 based at least in part on different avatar selections, and in other examples, the same avatar-specific word may be presented regardless of the avatar selections. For example, if user 108 said “bark,” a “woof” could be presented when the dog avatar is selected.
  • the process could convert the “bark” into a “woof” even though it wouldn't be appropriate for a cat to “woof.”
  • the process could convert “bark” into a recorded or simulated “meow,” based at least in part on the selection of the cat avatar.
  • the process could ignore the “bark” for avatars other than the dog avatar.
  • there may be a second level of audio feature analysis performed even after the extraction at 114 .
  • Video and audio features may also influence processing on the avatar specific utterances. For example, the level and pitch and intonation with which a user says “bark” may be detected as part of the audio feature extraction, and this may direct the system to select a specific “woof” sample or transform such a sample before and/or during the preview process.
  • FIG. 2 is another simplified block diagram illustrating example flow 200 for providing audio and/or video effects based at least in part on audio and/or video features detected in a user's recording.
  • example flow 200 much like in example flow 100 of FIG. 1 , there are two separate sessions: recording session 202 and playback session 204 .
  • recording session 202 device 206 may capture video having an audio component of user 208 at block 210 .
  • the capturing of video and audio may be triggered based at least in part on selection of a record affordance by user 208 .
  • user 208 may say the word “hello” at block 212 .
  • device 206 may continue to capture the video and/or audio components of the user's actions.
  • device 206 can continue capturing the video and audio components, and in this example, user 208 may hold his mouth open, but not say anything.
  • device 206 may also extract facial expressions from the video. However, in other examples, the facial feature extraction (or any video feature extraction) may actually take place after recording session 202 is complete. Still, it is possible for the extraction (e.g., analysis of the video) to be done in real-time while recording session 202 is still in process. In either case, the avatar process being executed by device 206 may identify through the extraction that the user opened his mouth briefly (e.g., without saying anything) and may employ some logic to determine what audio and/or video effects to implement.
  • the determination that the user held their mouth open without saying anything may require extraction and analysis of both audio and video.
  • extraction of the facial feature characteristics e.g., open mouth
  • Video and audio features may also influence processing on the avatar specific utterances.
  • the duration of the opening of the mouth, opening of eyes, etc. may direct the system to select a specific “woof” sample or transform such a sample before and/or during the preview process.
  • One such transformation is changing the level and/or duration of the woof to match the detected opening and closing of the user's mouth.
  • recording session 202 may end when user 208 selects the record affordance again (e.g., indicating a desire to end the recording), selects an end recording affordance (e.g., the record affordance may act as an end recording affordance while recording), or based at least in part on expiration of a time period (e.g., 20 seconds, 30 seconds, or the like).
  • a time period e.g. 20 seconds, 30 seconds, or the like.
  • user 208 may select a preview affordance, indicating that user 208 wishes to watch a preview of the recording.
  • One option could be to play the original recording without any visual or audio effects.
  • another option could be to play a revised version of the recording.
  • the avatar process may have revised the audio and/or video of the video clip.
  • device 206 may present avatar (also called a puppet and/or animoji) 218 on a screen of device 206 .
  • Device 206 may also be configured with speaker 220 that can play audio associated with the video clip.
  • block 216 corresponds to the same point in time as block 210 , where user 208 may not have been speaking yet.
  • avatar 218 may be presented with his mouth open; however, no audio is presented from speaker 220 yet.
  • the avatar process can present avatar 218 with an avatar-specific voice (as described above).
  • the avatar process may replace the silence identified at block 214 with an avatar-specific word.
  • the sound of a dog bark e.g., a recorded or simulated dog bark
  • the audio data e.g., in place of the silence
  • different avatar-specific words will be presented at 224 based at least in part on different avatar selections, and in other examples, the same avatar-specific word may be presented regardless of the avatar selections.
  • each avatar may have a predefined sound to be played when it is detected that user 208 has held his mouth open for an amount of time (e.g., a half second, a whole second, etc.) without speaking.
  • the process could ignore the detection of the open mouth for avatars that don't have a predefined effect for that facial feature.
  • the process may also detect how many “woof” sounds to insert (e.g., if the user held his mouth open for double the length of time used to indicate a bark) or whether it's not possible to insert the number of barks requested (e.g., in the scenario of FIG. 1 , where the user would speak “bark” to indicate a “woof” sound should be inserted.
  • user 208 can control effects of the playback (e.g., the recorded avatar message) with their facial and voice expressions. Further, while not shown explicitly in either FIG.
  • the user device can be configured with software for executing the avatar process (e.g., capturing the A/V information, extracting features, analyzing the data, implementing the logic, revising the audio and/or video files, and rendering the previews) as well as software for executing an application (e.g., an avatar application with its own UI) that enables the user to build the avatar messages and subsequently send them to other user devices.
  • the avatar process e.g., capturing the A/V information, extracting features, analyzing the data, implementing the logic, revising the audio and/or video files, and rendering the previews
  • an application e.g., an avatar application with its own UI
  • FIG. 3 is a simplified block diagram 300 illustrating components (e.g., software modules) utilized by the avatar process described above and below. In some examples, more or less modules can be utilized to implement the providing of audio and/or video effects based at least in part on audio and/or video features detected in a user's recording.
  • device 302 may be configured with camera 304 , microphone 306 , and a display screen for presenting a UI and the avatar previews (e.g., the initial preview before recording as well as the preview of the recording before sending).
  • the avatar process is configured with avatar engine 308 and voice engine 310 .
  • Avatar engine 308 can manage the list of avatars, process the video features (e.g., facial feature characteristics), revise the video information, communicate with voice engine 301 when appropriate, and render video of the avatar 312 when all processing is complete and effects have been implemented (or discarded).
  • Revising of the video information can include adjusting or otherwise editing the metadata associated with the video file. In this way, when the video metadata (adjusted or not) is used to render the puppet, the facial features can be mapped to the puppet.
  • voice engine 310 can store the audio information, perform the logic for determining what effects to implement, revise the audio information, and provide modified audio 314 when all processing is complete and effects have been implemented (or discarded).
  • video features 316 can be captured by camera 304 and audio features 318 can be captured by microphone 306 . In some cases there may be as many as (or more than) fifty facial features to be detected within video features 316 .
  • Example video features include, but are not limited to, duration of expressions, open mouth, frowns, smiles, eyebrows up or furrowed, etc.
  • video features 316 may include only metadata that identifies each of the facial features (e.g., data points that indicate which locations on the user's face moved or where in what position). Further, video features 316 can be passed to avatar engine 308 and voice engine 310 .
  • the metadata associated with video features 316 can be stored and analyzed.
  • avatar engine 308 may perform the feature extraction from the video file prior to storing the metadata.
  • the feature extraction may be performed prior to video features 316 being sent to avatar engine (in which case, video features 316 would be the metadata itself).
  • video features 316 may be compared with audio features 318 when it is helpful to match up what audio features correspond to which video features (e.g., to see if certain audio and video features occur at the same time).
  • audio features are also passed to voice engine 310 for storage.
  • Example audio features include, but are not limited to, level, pitch, dynamics (e.g., changes in level, pitching, voicing, formants, duration, etc.).
  • Raw audio 320 includes the unprocessed audio file as it's captured.
  • Raw audio 320 can be passed to voice engine 310 for further processing and potential (e.g., eventual) revision and it can also be stored separately so that the original audio can be used if desired.
  • Raw audio 320 can also be passed to voice recognition module 322 .
  • Voice recognition module 322 can be used to word spot and identify a user's intent from their voice. For example, voice recognition module 322 can determine when a user is angry, sad, happy, or the like.
  • voice recognition module 322 will detect this. Information detected and/or collected by voice recognition module 322 can then be passed to voice engine 310 for further logic and/or processing.
  • audio features are extracted in real-time during the preview itself. These audio features may be avatar specific, generated only if the associated avatar is being previewed. The audio features may be avatar agnostic, generated for all avatars. The audio signal can also be adjusted in part based on these real-time audio feature extractions, and with the pre-stored extracted video features which are created during or after the recording process, but before previewing. Additionally, some feature extraction may be performed during rendering at 336 by voice engine 310 . Some pre-stored sounds 338 may be used by voice engine 310 , as appropriate, to fill in the blanks or to replace other sounds that were extracted.
  • voice engine 310 will make the determination regarding what to do with the information extracted from voice recognition module 322 .
  • voice engine 310 can pass the information from voice recognition module 322 to feature module 324 for determining which features correspond to the data extracted by voice recognition module 322 .
  • feature module 324 may indicate (e.g., based on a set of rules and/or logic) that a sad voice detected by voice recognition module 322 corresponds to a raising of the pitch of the voice, or the slowing down of the speed or cadence of the voice.
  • feature module 322 can map the extracted audio features to particular voice features.
  • effect type module 326 can map the particular voice features to the desired effect.
  • Voice engine 310 can also be responsible for storing each particular voice for each possible avatar. For example, there may be standard or hardcoded voices for each avatar. Without any other changes being made, if a user selects a particular avatar, voice engine 310 can select the appropriate standard voice for use with playback. In this case, modified audio 314 may just be raw audio 320 transformed to the appropriate avatar voice based on the selected avatar. As the user scrolls through the avatars and selects different ones, voice engine 310 can modify raw audio 320 on the fly to make it sound like the newly selected avatar. Thus, avatar type 328 needs to be provided to voice engine 310 to make this change.
  • voice engine 310 can revise raw audio file 320 and provide modified audio 314 .
  • the user will be provided with an option to use the original audio file at on/off 330 . If the user selects “off” (e.g., effects off), then raw audio 320 can be combined with video of avatar 312 (e.g., corresponding to the unchanged video) to make A/V output 332 .
  • A/V output 332 can be provided to the avatar application presented on the UI of device 302 .
  • Avatar engine 308 can be responsible for providing the initial avatar image based at least in part on the selection of avatar type 328 . Additionally, avatar engine 308 is responsible for mapping video features 316 to the appropriate facial markers of each avatar. For example, if video features 316 indicate that the user is smiling, the metadata that indicates a smile can be mapped to the mouth area of the selected avatar so that the avatar appears to be smiling in video of avatar 312 . Additionally, avatar engine 308 can receive timing changes 334 from voice engine, as appropriate.
  • voice engine 310 determines that voice effect is to make the audio be more of a whispering voice (e.g., based on feature module 324 and/or effect type 326 and or the avatar type), and modifies the voice to be more of a whispered voice, this effect change may include slowing down the voice itself, in addition to a reduced level and other formant and pitch changes. Accordingly, the voice engine may produce a modified audio which is slower in playback speed relative to the original audio file for the audio clip. In this scenario, voice engine 310 would need to instruct avatar engine 308 via timing changes 334 , so that the video file can be slowed down appropriately; otherwise, the video and audio would not be synchronized.
  • voice engine 310 would need to instruct avatar engine 308 via timing changes 334 , so that the video file can be slowed down appropriately; otherwise, the video and audio would not be synchronized.
  • a user may use the avatar application of device 302 to select different avatars.
  • the voice effect can change based at least in part on this selection.
  • the user may be given the opportunity to select a different voice for a given avatar (e.g., the cat voice for the dog avatar, etc.).
  • This type of free-form voice effect change can be executed by the user via selection on the UI or, in some cases, with voice activation or face motion.
  • a certain facial expression could trigger voice engine 310 to change the voice effect for a given avatar.
  • voice engine 310 may be configured to make children's voices sound more high pitched or, alternatively, determine not to make a child's voice more high pitched because it would sound inappropriate given that raw audio 320 for a child's voice might already be high pitched. Making this user specific determination of an effect could be driven in part by the audio features extracted, and in this case such features could include pitch values and ranges throughout the recording.
  • voice recognition module 322 may include a recognition engine, a word spotter, a pitch analyzer, and/or a formant analyzer. The analysis performed by voice recognition module 322 will be able to identify if the user if upset, angry, happy, etc. Additionally, voice recognition module 322 may be able to identify context and/or intonation of the user's voice, as well as change the intention of wording and/or determine a profile (e.g., a virtual identity) of the user.
  • a profile e.g., a virtual identity
  • the avatar process 300 can be configured to package/render the video clip by combining video of avatar 312 and either modified audio 314 or raw audio 320 into A/V output 332 .
  • voice engine 310 just needs to know an ID for the metadata associated with video of avatar 312 (e.g., it does not actually need video of avatar 312 , it just needs the ID of the metadata).
  • a message within a messaging application e.g., the avatar application
  • the message includes A/V output 332 .
  • the last video clip to be previewed can be sent.
  • the cat avatar video would be sent when the user selects “send.”
  • the state of the last preview can be stored and used later. For example, if the last message (e.g., avatar video clip) sent used a particular effect, the first preview of the next message being generated can utilize that particular effect.
  • voice engine 310 and/or avatar engine 308 can check for certain cues and/or features, and then revise the audio and/or video files to implement the desired effect.
  • Some example feature/effect pairs include: detecting that user has opened their mouth and paused for a moment. In this example, both facial feature characteristics (e.g., mouth open) and audio feature characteristics (e.g., silence) need to happen at the same time in order for the desired effect to be implemented. For this feature/effect pair, the desired effect to revise the audio and video so that the avatar appears to make an avatar/animal-specific sound.
  • a dog will make a bark sound
  • a cat will make a meow sound
  • a monkey horse, unicorn, etc.
  • Other example feature/effect pairs include lower the audio pitch and/or tone when a frown is detected.
  • this effect could be implemented based at least in part on voice recognition module 322 detecting sadness in the voice of the user.
  • video features 316 wouldn't be needed at all.
  • Other example feature/effect pairs include whispering to cause the audio and video speeds to be slowed, toned down, and/or a reduction in changes.
  • video changes can lead to modifications of the audio while, in other case, audio changes can lead to modifications of the video.
  • avatar engine 308 may act as the feature extractor, in which case video features 316 and audio features 318 may not exist prior to being sent to avatar engine 308 . Instead, raw audio 320 and metadata associated with the raw video may be passed into avatar engine 308 , where avatar engine 308 may extract the audio feature characteristics and the video (e.g., facial) feature characteristics. In other words, while not drawn this way in FIG. 3 , parts of avatar engine 308 may actually exist within camera 304 . Additionally, in some examples, metadata associated with video features 316 can be stored in a secure container, and when voice engine 310 is running, it can read the metadata from the container.
  • the audio and video information can be processed offline (e.g., not in real-time).
  • avatar engine 308 and voice engine 310 can read ahead in the audio and video information and make context decisions up front. Then, voice engine 310 can revise the audio file accordingly. This ability to read ahead and make decisions offline will greatly increase the efficiency of the system, especially for longer recordings. Additionally, this enables a second stage of analysis, where additional logic can be processed. Thus, the entire audio file can be analyzed before making any final decisions.
  • voice engine 310 can take the information from voice recognition 322 and determine to ignore the second “bark,” because it won't be possible to include both “woof” sounds in the audio file.
  • the audio file and the video are packaged together to make A/V output 332 , voice engine does not actually need to access video of avatar 312 .
  • the video file e.g., a .mov format file, or the like
  • the video file is created as the video is being played by accessing an array of features (e.g., floating-point values) that were written to the metadata file.
  • features e.g., floating-point values
  • each modified video clip could be saved temporarily (e.g., cached), such that if the user reselects an avatar that's already been previewed, the processing to generate/render that particular preview does not need to be duplicated.
  • the above noted caching of rendered video clips would enable the realization of large savings in processor power and instructions per second (IPS), especially for longer recordings and/or recordings with a large number of effects.
  • noise suppression algorithms can be employed for handling cases where the sound captured by microphone 306 includes sounds other than the user's voice. For example, when the user is in a windy area, or a loud room (e.g., a restaurant or bar). In these examples, a noise suppression algorithm could lower the decibel output of certain parts of the audio recording. Alternatively, or in addition, different voices could be separated and/or only audio coming from certain angles of view (e.g., the angle of the user's face) could be collected, and other voices could be ignored or suppressed. In other cases, if the avatar process 300 determines that the noise levels are too loud or will be difficult to process, the process 300 could disable the recording option.
  • FIG. 4 illustrates an example flow diagram showing process 400 for implementing various audio and/or video effects based at least in part on audio and/or video features, according to at least a few embodiments.
  • computing device 106 of FIG. 1 or other similar user device e.g., utilizing at least avatar process 300 of FIG. 3
  • computing device 106 may capture video having an audio component.
  • the video and audio may be captured by two different hardware components (e.g., a camera may capture the video information while a microphone may capture the audio information).
  • a single hardware component may be configured to capture both audio and video.
  • the video and audio information may be associated with one another (e.g., by sharing an ID, timestamp, or the like).
  • the video may have an audio component (e.g., they are part of the same file), or the video may be linked with an audio component (e.g., two files that are associated together).
  • computing device 106 may extract facial features and audio features from the captured video and audio information, respectively.
  • the facial feature information may be extracted via avatar engine 308 and stored as metadata.
  • the metadata can be used to map each facial feature to a particular puppet or to any animation or virtual face.
  • the actual video file does not need to be stored, creating memory storage efficiency and significant savings.
  • a voice recognition algorithm can be utilized to extract different voice features; for example, words, phrases, pitch, speed, etc.
  • computing device 106 may detect context from the extracted features.
  • context may include a user's intent, mood, setting, location, background items, ideas, etc.
  • the context can be important when employing logic to determine what effects to apply.
  • the context can be combined with detected spoken words to determine whether and/or how to adjust the audio file and/or the video file.
  • a user may furrow his eyebrows and speak slowly. The furrowing of the eyebrows is a video feature that could have been extracted at block 404 and the slow speech is an audio feature that could have been extracted at block 404 .
  • those two features might mean something different; however, when combined together, the avatar process can determine that the user is concerned about something.
  • the context of the message might be that a parent is speaking to a child, or a friend is speaking to another friend about a serious or concerning matter.
  • computing device 106 may determine effects for rendering the audio and/or video files based at least in part on the context.
  • a particular video and/or audio feature may be employed for this effect.
  • the voice file may be adjusted to sound more somber, or to be slowed down.
  • the avatar-specific voice might be replaced with a version of the original (e.g., raw) audio to convey the seriousness of the message.
  • the context may be animal noises (e.g., based on the user saying “bark” or “meow” or the like. In this case, the determined effect would be to replace the spoken word “bark” with the sound of a dog barking.
  • computing device 106 may perform additional logic for additional effects. For example, if the user attempted to effectuate the bark effect by saying bark twice in a row, the additional logic may need to be utilized to determine whether the additional bark is technically feasible. As an example, if the audio clip of the bark that is used to replace the spoken word in the raw audio information is 0.5 seconds long, but the user says “bark” twice in a 0.7-second span, the additional logic can determine that two bark sounds cannot fit in the 0.7 seconds available. Thus, the audio and video file may need to be extended in order to fit both bark sounds, the bark sound may need to be shortened (e.g., by processing the stored bark sound), or the second spoken word bark may need to be ignored.
  • computing device 106 may revise the audio and/or video information based at least in part on the determined effects and/or additional effects.
  • the raw audio file may be adjusted (e.g., revised) to form a new audio file with additional sounds added and/or subtracted.
  • the spoken word “bark” will be removed from the audio file and a new sound that represents an actual dog barking will be inserted.
  • the new file can be saved with a different ID, or with an appended ID (e.g., the raw audio ID, with a .v2 identifier to indicate that it is not the original). Additionally, the raw audio file will be saved separately so that it can be reused for additional avatars and/or if the user decides not to use the determined effects.
  • computing device 106 may receive a selection of an avatar from the user.
  • the user may select one of a plurality of different avatars through a UI of the avatar application being executed by computing device 106 .
  • the avatars may be selected via a scroll wheel, drop down menu, or icon menu (e.g., where each avatar is visible on the screen in its own position).
  • computing device 106 may present the revised video with the revised audio based at least in part on the selected avatar.
  • each adjusted video clip e.g., a final clip for the avatar that has adjusted audio and/or adjust video
  • each adjusted video clip may be generated for each respective avatar prior to selection of the avatar by the user. This way, the processing has already been completed, and the adjusted video clip is ready to be presented immediately upon selection of the avatar. While this might require additional IPS prior to avatar selection, it will speed up the presentation. Additionally, the processing of each adjusted video clip can be performed while the user is reviewing the first preview (e.g., the preview that corresponds to the first/default avatar presented in the UI).
  • FIG. 5 illustrates an example flow diagram showing process 500 for implementing various audio and/or video effects based at least in part on audio and/or video features, according to at least a few embodiments.
  • computing device 106 of FIG. 1 or other similar user device e.g., utilizing at least avatar process 300 of FIG. 3
  • computing device 106 may capture video having an audio component.
  • the video and audio may be captured by two different hardware components (e.g., a camera may capture the video information while a microphone may capture the audio information).
  • the video may have an audio component (e.g., they are part of the same file), or the video may be linked with an audio component (e.g., two files that are associated together).
  • computing device 106 may extract facial features and audio features from the captured video and audio information, respectively.
  • the facial feature information may be extracted via avatar engine 308 and stored as metadata.
  • the metadata can be used to map each facial feature to a particular puppet or to any animation or virtual face.
  • the actual video file does not need to be stored, creating memory storage efficiency and significant savings.
  • a voice recognition algorithm can be utilized to extract different voice features; for example, words, phrases, pitch, speed, etc.
  • avatar engine 308 and/or voice engine 310 may perform the audio feature extraction.
  • computing device 106 may detect context from the extracted features.
  • context may include a user's intent, mood, setting, location, ideas, identity, etc.
  • the context can be important when employing logic to determine what effects to apply.
  • the context can be combined with spoken words to determine whether and/or how to adjust the audio file and/or the video file.
  • a user's age may be detected as the context (e.g., child, adult, etc.) based at least in part on facial and/or voice features.
  • a child's face may have particular features that can be identified (e.g., large eyes, a small nose, and a relatively small head, etc.). As such, a child context may be detected.
  • computing device 106 may receive a selection of an avatar from the user.
  • the user may select one of a plurality of different avatars through a UI of the avatar application being executed by computing device 106 .
  • the avatars may be selected via a scroll wheel, drop down menu, or icon menu (e.g., where each avatar is visible on the screen in its own position).
  • computing device 106 may determine effects for rendering the audio and/or video files based at least in part on the context and the selected avatar.
  • the effects for each avatar may be generated upon selection of each avatar, as opposed to all at once. In some instances, this will enable realization of significant processor and memory savings, because only one set of effects and avatar rendering will be performed at a time. These savings can be realized especially when the user does not select multiple avatars to preview.
  • computing device 106 may perform additional logic for additional effects, similar to that described above with respect to block 410 of FIG. 4 .
  • computing device 106 may revise the audio and/or video information based at least in part on the determined effects and/or additional effects for the selected avatar, similar to that described above with respect to block 412 of FIG. 4 .
  • computing device 106 may present the revised video with the revised audio based at least in part on the selected avatar, similar that described above with respect to block 416 of FIG. 4 .
  • the avatar process 300 may determine whether to perform flow 400 or flow 500 based at least in part on historical information. For example, if the user generally uses the same avatar every time, flow 500 will be more efficient. However, if the user regularly switches between avatars, and previews multiple different avatars per video clip, then following flow 400 may be more efficient.
  • FIG. 6 illustrates an example UI 600 for enabling a user to utilize the avatar application (e.g., corresponding to avatar application affordance 602 ).
  • UI 600 may look different (e.g., it may appear as a standard text (e.g., short messaging service (SMS)) messaging application) until avatar application affordance 602 is selected.
  • SMS short messaging service
  • the avatar application can communicate with the avatar process (e.g., avatar process 300 of FIG. 3 ) to make requests for capturing, processing (e.g., extracting features, running logic, etc.), and adjusting audio and/video.
  • the avatar application may make an application programming interface (API) call to the avatar process to begin capturing video and audio information using the appropriate hardware components.
  • record/send video clip affordance 604 may be represented as a red circle (or a plain circle without the line shown in FIG. 6 ) prior to the recording session beginning. In this way, the affordance will look more like a standard record button.
  • the appearance of record/send video clip affordance 604 may be changed to look like a clock countdown or other representation of a timer (e.g., if the length of video clip recordings is limited).
  • the record/send video clip affordance 604 may merely change colors to indicate that the avatar application is recording. If there is no timer, or limit on the length of the recording, the user may need to select record/send video clip affordance 604 again to terminate the recording.
  • a user may use avatar selection affordance 606 to select an avatar. This can be done before recording of the avatar video clip and/or after recording of the avatar video clip. When selected before recording, the initial preview of the user's motions and facial characteristics will be presented as the selected avatar. Additionally, the recording will be performed while presenting a live (e.g., real-time) preview of the recording, with the user's face being represented by the selected avatar. Once the recording is completed, a second preview (e.g., a replay of the actual recording) will be presented, again using the selected avatar. However, at this stage, the user can scroll through avatar selection affordance 606 to select a new avatar to view the recording preview.
  • a live (e.g., real-time) preview of the recording with the user's face being represented by the selected avatar.
  • a second preview e.g., a replay of the actual recording
  • the UI upon selection of a new avatar, the UI will begin to preview the recording using the selected avatar.
  • the new preview can be presented with the audio/video effects or as originally recorded.
  • the determination regarding whether to present the effected version or the original may be based at least in part on the last method of playback used. For example, if the last playback used effects, the first playback after a new avatar selection may use effects. However, if the last playback did not use effects, the first playback after a new avatar selection may not use effects.
  • the use can replay the video clip with effects by selecting effects preview affordance 608 or without effects by selecting original preview affordance 610 .
  • the user can send the avatar video in a message to another computing device using record/send video clip affordance 604 .
  • the video clip will be sent using the format corresponding to the last preview (e.g., with or without effects).
  • delete video clip affordance 612 may be selected to delete the avatar video and either start over or exit the avatar and/or messaging applications.
  • FIG. 7 illustrates an example flow diagram showing process (e.g., a computer-implemented method) 700 for implementing various audio and/or video effects based at least in part on audio and/or video features, according to at least a few embodiments.
  • computing device 106 of FIG. 1 or other similar user device e.g., utilizing at least an avatar application similar to that shown in FIG. 6 and avatar process 300 of FIG. 3 ) may perform the process 700 of FIG. 7 .
  • computing device 106 may display a virtual avatar generation interface.
  • the virtual avatar generation interface may look similar to the UI illustrated in FIG. 6 . However, any UI configured to enable the same features described herein can be used.
  • computing device 106 may display first preview content of a virtual avatar.
  • the first preview content may be a real-time representation of the user's face, including movement and facial expressions.
  • the first preview would provide an avatar (e.g., cartoon character, digital/virtual puppet) to represent the user's face instead of an image of the user's face.
  • This first preview may be video only, or at least a rendering of the avatar without sound. In some examples, this first preview is not recorded and can be utilized for as long as the user desires, without limitation other than batter power or memory space of computing device 106 .
  • computing device 106 may detect selection of an input (e.g., record/send video clip affordance 604 of FIG. 6 ) in the virtual avatar generation interface. This selection may be made while the UI is displaying the first preview content.
  • an input e.g., record/send video clip affordance 604 of FIG. 6
  • computing device 106 may begin capturing video and audio signals based at least in part on the input detected at block 706 .
  • the video and audio signals may be captured by appropriate hardware components and can be captured by one or a combination of such components.
  • computing device 106 may extract audio feature characteristics and facial feature characteristics as described in detail above. As noted, the extraction may be performed by particular modules of avatar process 300 of FIG. 3 or by other extraction and/or analysis components of the avatar application and/or computing device 106 .
  • computing device 106 may generate adjusted audio signal based at least in part on facial feature characterizes and audio feature characteristics.
  • the audio file captured at block 708 may be permanently (or temporarily) revised (e.g., adjusted) to include new sounds, new words, etc., and/or to have the original pitch, tone, volume, etc., adjusted.
  • These adjustments can be made based at least in part on the context detected via analysis of the facial feature characterizes and audio feature characteristics. Additionally, the adjustments can be made based on the type of avatar selected and/or based on specific motions, facial expressions, words, phrases, or actions performed by the user (e.g., expressed by the user's face) during the recording session.
  • computing device 106 may generate second preview content of the virtual avatar in the UI according to the adjusted audio signal.
  • the generated second preview content may be based at least in part on the currently selected avatar or some default avatar. Once the second preview content is generated, computing device 106 can present the second preview content in the UI at block 716 .
  • FIG. 8 illustrates an example flow diagram showing process (e.g., instructions stored on a computer-readable memory that can be executed) 800 for implementing various audio and/or video effects based at least in part on audio and/or video features, according to at least a few embodiments.
  • computing device 106 of FIG. 1 or other similar user device e.g., utilizing at least an avatar application similar to that shown in FIG. 6 and avatar process 300 of FIG. 3 ) may perform the process 800 of FIG. 8 .
  • computing device 106 may detect a request to generate an avatar video clip of a virtual avatar.
  • the request may be based at least in part on a user's selection of send/record video clip affordance 604 of FIG. 6 .
  • computing device 106 may capture a video signal associated with a face in the field of view of the camera.
  • computing device 106 may capture an audio signal corresponding to the video signal (e.g., coming from the face being captured by the camera).
  • computing device 106 may extract voice feature characteristics from the audio signal and at block 810 , computing device 106 may extract facial feature characterizes from the video signal.
  • computing device 106 may detect a request to preview the avatar video clip. This request may be based at least in part on a user's selection of a new avatar via avatar selection affordance 606 of FIG. 6 or based at least in part on a user's selection of effects preview affordance 608 of FIG. 6 .
  • computing device 106 may generate adjusted audio signal based at least in part on facial feature characterizes and voice feature characteristics.
  • the audio file captured at block 806 may be revised (e.g., adjusted) to include new sounds, new words, etc., and/or to have the original pitch, tone, volume, etc., adjusted.
  • These adjustments can be made based at least in part on the context detected via analysis of the facial feature characterizes and voice feature characteristics. Additionally, the adjustments can be made based on the type of avatar selected and/or based on specific motions, facial expressions, words, phrases, or actions performed by the user (e.g., expressed by the user's face) during the recording session.
  • computing device 106 may generate a preview of the virtual avatar in the UI according to the adjusted audio signal.
  • the generated preview may be based at least in part on the currently selected avatar or some default avatar.
  • computing device 106 can also present the second preview content in the UI at block 816 .
  • FIG. 9 is a simplified block diagram illustrating example architecture 900 for implementing the features described herein, according to at least one embodiment.
  • computing device 902 e.g., computing device 106 of FIG. 1
  • having example architecture 900 may be configured to present relevant UIs, capture audio and video information, extract relevant data, perform logic, revise the audio and video information, and present animoji videos.
  • Computing device 902 may be configured to execute or otherwise manage applications or instructions for performing the described techniques such as, but not limited to, providing a user interface (e.g., user interface 600 of FIG. 6 ) for recording, previewing, and/or sending virtual avatar video clips.
  • Computing device 602 may receive inputs (e.g., utilizing I/O device(s) 904 such as a touch screen) from a user at the user interface, capture information, process the information, and then present the video clips as previews also utilizing I/O device(s) 904 (e.g., a speaker of computing device 902 ).
  • Computing device 902 may be configured to revise audio and/or video files based at least in part on facial features extracted from the captured video and/or voice features extracted from the captured audio.
  • Computing device 902 may be any type of computing device such as, but not limited to, a mobile phone (e.g., a smartphone), a tablet computer, a personal digital assistant (PDA), a laptop computer, a desktop computer, a thin-client device, a smart watch, a wireless headset, or the like.
  • a mobile phone e.g., a smartphone
  • a tablet computer e.g., a tablet computer
  • PDA personal digital assistant
  • laptop computer e.g., a laptop computer
  • desktop computer e.g., a desktop computer
  • a thin-client device e.g., a smart watch
  • a wireless headset e.g., a wireless headset, or the like.
  • computing device 902 may include at least one memory 914 and one or more processing units (or processor(s)) 916 .
  • Processor(s) 916 may be implemented as appropriate in hardware, computer-executable instructions, or combinations thereof.
  • Computer-executable instruction or firmware implementations of processor(s) 916 may include computer-executable or machine-executable instructions written in any suitable programming language to perform the various functions described.
  • Memory 914 may store program instructions that are loadable and executable on processor(s) 916 , as well as data generated during the execution of these programs.
  • memory 914 may be volatile (such as random access memory (RAM)) and/or non-volatile (such as read-only memory (ROM), flash memory, etc.).
  • Computing device 902 may also include additional removable storage and/or non-removable storage 926 including, but not limited to, magnetic storage, optical disks, and/or tape storage.
  • the disk drives and their associated non-transitory computer-readable media may provide non-volatile storage of computer-readable instructions, data structures, program modules, and other data for the computing devices.
  • memory 914 may include multiple different types of memory, such as static random access memory (SRAM), dynamic random access memory (DRAM), or ROM. While the volatile memory described herein may be referred to as RAM, any volatile memory that would not maintain data stored therein once unplugged from a host and/or power would be appropriate.
  • SRAM static random access memory
  • DRAM dynamic random access memory
  • ROM ROM
  • Memory 914 and additional storage 926 are all examples of non-transitory computer-readable storage media.
  • non-transitory computer readable storage media may include volatile or non-volatile, removable or non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules, or other data.
  • Memory 914 and additional storage 926 are both examples of non-transitory computer storage media.
  • Additional types of computer storage media may include, but are not limited to, phase-change RAM (PRAM), SRAM, DRAM, RAM, ROM, electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, compact disc read-only memory (CD-ROM), digital video disc (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to store the desired information and that can be accessed by computing device 902 . Combinations of any of the above should also be included within the scope of non-transitory computer-readable storage media.
  • PRAM phase-change RAM
  • SRAM static random access memory
  • DRAM dynamic random access memory
  • RAM random access memory
  • ROM read-only memory
  • EEPROM electrically erasable programmable read-only memory
  • flash memory or other memory technology
  • CD-ROM compact disc read-only memory
  • DVD digital video disc
  • magnetic cassettes magnetic tape
  • magnetic disk storage magnetic disk storage devices
  • computer-readable communication media may include computer-readable instructions, program modules, or other data transmitted within a data signal, such as a carrier wave, or other transmission.
  • computer-readable storage media does not include computer-readable communication media.
  • Computing device 902 may also contain communications connection(s) 928 that allow computing device 902 to communicate with a data store, another computing device or server, user terminals and/or other devices via one or more networks.
  • Such networks may include any one or a combination of many different types of networks, such as cable networks, the Internet, wireless networks, cellular networks, satellite networks, other private and/or public networks, or any combination thereof.
  • Computing device 902 may also include I/O device(s) 904 , such as a touch input device, a keyboard, a mouse, a pen, a voice input device, a display, a speaker, a printer, etc.
  • memory 914 may include operating system 932 and/or one or more application programs or services for implementing the features disclosed herein including user interface module 934 , avatar control module 936 , avatar application module 938 , and messaging module 940 .
  • Memory 914 may also be configured to store one or more audio and video files to be used to produce audio and video output. In this way, computing device 902 can perform all of the operations described herein.
  • user interface module 934 may be configured to manage the user interface of computing device 902 .
  • user interface module 934 may present any number of various UIs requested by computing device 902 .
  • user interface module 934 may be configured to present UI 600 of FIG. 6 , which enables implementation of the features describe herein, including communication with avatar process 300 of FIG. 3 which is responsible for capturing video and audio information, extracting appropriate facial feature and voice feature information, and revising the video and audio information prior to presentation of the generated avatar video clips as described above.
  • avatar control module 936 is configured to implement (e.g., execute instructions for implementing) avatar process 300 while avatar application module 938 is configured to implement the user facing application.
  • avatar application module 938 may utilize one or more APIs for requesting and/or providing information to avatar control module 936 .
  • messaging module 940 may implement any standalone or add-on messaging application that can communicate with avatar control module 936 and/or avatar application module 938 .
  • messaging module 940 may be fully integrated with avatar application module 938 (e.g., as seen in UI 600 of FIG. 6 ), where the avatar application appears to be part of the messaging application.
  • messaging application 940 may call to avatar application module 938 when a user requests to generate an avatar video clip, and avatar application module 938 may open up a new application altogether that is in integrated with messaging module 940 .
  • Computing device 902 may also be equipped with a camera and microphone, as shown in at least FIG. 3 , and processors 916 may be configured to execute instructions to display a first preview of a virtual avatar.
  • processors 916 may be configured to execute instructions to display a first preview of a virtual avatar.
  • an input may be detected via a virtual avatar generation interface presented by user interface module 934 .
  • avatar control module 936 may initiate a capture session including: capturing, via the camera, a video signal associated with a face in a field of view of the camera, capturing, via the microphone, an audio signal associated with the captured video signal, extracting audio feature characteristics from the captured audio signal, and extracting facial feature characteristics associated with the face from the captured video signal.
  • avatar control module 936 may generate an adjusted audio signal based at least in part on the audio feature characteristics and the facial feature characteristics, and display a second preview of the virtual avatar in the virtual avatar generation interface according to the facial feature characteristics and the adjusted audio signal.
  • the various embodiments further can be implemented in a wide variety of operating environments, which in some cases can include one or more user computers, computing devices or processing devices which can be used to operate any of a number of applications.
  • User or client devices can include any of a number of general purpose personal computers, such as desktop or laptop computers running a standard operating system, as well as cellular, wireless and handheld devices running mobile software and capable of supporting a number of networking and messaging protocols.
  • Such a system also can include a number of workstations running any of a variety of commercially-available operating systems and other known applications for purposes such as development and database management.
  • These devices also can include other electronic devices, such as dummy terminals, thin-clients, gaming systems and other devices capable of communicating via a network.
  • Most embodiments utilize at least one network that would be familiar to those skilled in the art for supporting communications using any of a variety of commercially-available protocols, such as TCP/IP, OSI, FTP, UPnP, NFS, CIFS, and AppleTalk.
  • the network can be, for example, a local area network, a wide-area network, a virtual private network, the Internet, an intranet, an extranet, a public switched telephone network, an infrared network, a wireless network, and any combination thereof.
  • the network server can run any of a variety of server or mid-tier applications, including HTTP servers, FTP servers, CGI servers, data servers, Java servers, and business application servers.
  • the server(s) also may be capable of executing programs or scripts in response requests from user devices, such as by executing one or more applications that may be implemented as one or more scripts or programs written in any programming language, such as Java®, C, C# or C++, or any scripting language, such as Perl, Python or TCL, as well as combinations thereof.
  • the server(s) may also include database servers, including without limitation those commercially available from Oracle Microsoft®, Sybase®, and IBM®.
  • the environment can include a variety of data stores and other memory and storage media as discussed above. These can reside in a variety of locations, such as on a storage medium local to (and/or resident in) one or more of the computers or remote from any or all of the computers across the network. In a particular set of embodiments, the information may reside in a storage-area network (SAN) familiar to those skilled in the art. Similarly, any necessary files for performing the functions attributed to the computers, servers or other network devices may be stored locally and/or remotely, as appropriate.
  • SAN storage-area network
  • each such device can include hardware elements that may be electrically coupled via a bus, the elements including, for example, at least one central processing unit (CPU), at least one input device (e.g., a mouse, keyboard, controller, touch screen or keypad), and at least one output device (e.g., a display device, printer or speaker).
  • CPU central processing unit
  • input device e.g., a mouse, keyboard, controller, touch screen or keypad
  • output device e.g., a display device, printer or speaker
  • Such a system may also include one or more storage devices, such as disk drives, optical storage devices, and solid-state storage devices such as RAM or ROM, as well as removable media devices, memory cards, flash cards, etc.
  • Such devices can include a computer-readable storage media reader, a communications device (e.g., a modem, a network card (wireless or wired), an infrared communication device, etc.), and working memory as described above.
  • the computer-readable storage media reader can be connected with, or configured to receive, a non-transitory computer-readable storage medium, representing remote, local, fixed, and/or removable storage devices as well as storage media for temporarily and/or more permanently containing, storing, transmitting, and retrieving computer-readable information.
  • the system and various devices also typically will include a number of software applications, modules, services or other elements located within at least one working memory device, including an operating system and application programs, such as a client application or browser.
  • Non-transitory storage media and computer-readable storage media for containing code, or portions of code can include any appropriate media known or used in the art (except for transitory media like carrier waves or the like) such as, but not limited to, volatile and non-volatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules or other data, including RAM, ROM, Electrically Erasable Programmable Read-Only Memory (EEPROM), flash memory or other memory technology, CD-ROM, DVD or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices or any other medium which can be used to store the desired information and which can be accessed by a system device.
  • RAM random access memory
  • ROM read-only memory
  • EEPROM Electrically Erasable Programmable Read-Only Memory
  • flash memory or other memory technology
  • CD-ROM Compact Disc
  • DVD Compact Disc
  • magnetic cassettes magnetic tape
  • magnetic disk storage magnetic disk storage devices or any other medium which can be
  • Disjunctive language such as the phrase “at least one of X, Y, or Z,” unless specifically stated otherwise, is otherwise understood within the context as used in general to present that an item, term, etc., may be either X, Y, or Z, or any combination thereof (e.g., X, Y, and/or Z). Thus, such disjunctive language is not generally intended to, and should not, imply that certain embodiments require at least one of X, at least one of Y, or at least one of Z to each be present. Additionally, conjunctive language such as the phrase “at least one of X, Y, and Z,” unless specifically stated otherwise, should also be understood to mean X, Y, Z, or any combination thereof, including “X, Y, and/or Z.”

Abstract

Embodiments of the present disclosure can provide systems, methods, and computer-readable medium for adjusting audio and/or video information of a video clip based at least in part on facial feature and/or voice feature characteristics extracted from hardware components. For example, in response to detecting a request to generate an avatar video clip of a virtual avatar, a video signal associated with a face in a field of view of a camera and an audio signal may be captured. Voice feature characteristics and facial feature characteristics may be extracted from the audio signal and the video signal, respectively. In some examples, in response to detecting a request to preview the avatar video clip, an adjusted audio signal may be generated based at least in part on the facial feature characteristics and the voice feature characteristics, and a preview of the video clip of the virtual avatar using the adjusted audio signal may be displayed.

Description

    CROSS-REFERENCES TO RELATED APPLICATIONS
  • This application claims the benefit of U.S. Provisional Patent Application No. 62/507,177, entitled “Emoji Recording and Sending,” filed May 16, 2017, U.S. Provisional Patent Application No. 62/556,412, entitled “Emoji Recording and Sending,” filed Sep. 9, 2017, and U.S. Provisional Patent Application No. 62/557,121, entitled “Emoji Recording and Sending,” filed Sep. 11, 2017, the entire disclosures of each being herein incorporated by reference for all purposes.
  • BACKGROUND
  • Multimedia content, such as emoji's, can be sent as part of messaging communications. The emoji's can represent a variety of predefined people, objects, actions, and/or other things. Some messaging applications allow users to select from a predefined library of emoji's which can be sent as part of a message that can contain other content (e.g., other multimedia and/or textual content). Animojis are one type of this other multimedia content, where a user can select an avatar (e.g., a puppet) to represent themselves. The animoji can move and talk as if it were a video of the user. Animojis enable users to create personalized versions of emoji's in a fun and creative way.
  • SUMMARY
  • Embodiments of the present disclosure can provide systems, methods, and computer-readable medium for implementing avatar video clip revision and playback techniques. In some examples, a computing device can present a user interface (UI) for tracking a user's face and presenting a virtual avatar representation (e.g., a puppet or video character version of the user's face). Upon identifying a request to record, the computing device can capture audio and video information, extract and detect context as well as facial feature characteristics and voice feature characteristics, revise the audio and/or video information based at least in part on the extracted/identified features, and present a video clip of the avatar using the revised audio and/or video information.
  • In some embodiments, a computer-implemented method for implementing various audio and video effects techniques may be provided. The method may include displaying a virtual avatar generation interface. The method may also include displaying first preview content of a virtual avatar in the virtual avatar generation interface, the first preview content of the virtual avatar corresponding to realtime preview video frames of a user headshot in a field of view of the camera and associated headshot changes in an appearance. The method may also include detecting an input in the virtual avatar generation interface while displaying the first preview content of the virtual avatar. In some examples, in response to detecting the input in the virtual avatar generation interface, the method may also include: capturing, via the camera, a video signal associated with the user headshot during a recording session, capturing, via the microphone, a user audio signal during the recording session, extracting audio feature characteristics from the captured user audio signal, and extracting facial feature characteristics associated with the face from the captured video signal. Additionally, in response to detecting expiration of the recording session, the method may also include: generating an adjusted audio signal from the captured audio signal based at least in part on the facial feature characteristics and the audio feature characteristics, generating second preview content of the virtual avatar in the virtual avatar generation interface according to the facial feature characteristics and the adjusted audio signal, and presenting the second preview content in the virtual avatar generation interface.
  • In some embodiments, the method may also include storing facial feature metadata associated with the facial feature characteristics extracted from the video signal and generating adjusted facial feature metadata from the facial feature metadata based at least in part on the facial feature characteristics and the audio feature characteristics. Additionally, the second preview of the virtual avatar may be displayed further according to the adjusted facial metadata. In some examples, the first preview of the virtual avatar may be displayed according to preview facial feature characteristics identified according to the changes in the appearance of the face during a preview session.
  • In some embodiments, an electronic device for implementing various audio and video effects techniques may be provided. The system may include a camera, a microphone, a library of pre-recorded/pre-determined audio, and one or more processors in communication with the camera and the microphone. In some examples, the processors may be configured to execute computer-executable instructions to perform operations. The operations may include detecting an input in a virtual avatar generation interface while displaying a first preview of a virtual avatar. The operations may also include initiating a capture session including in response to detecting the input in the virtual avatar generation interface. The capture session may include: capturing, via the camera, a video signal associated with a face in a field of view of the camera, capturing, via the microphone, an audio signal associated with the captured video signal, extracting audio feature characteristics from the captured audio signal, and extracting facial feature characteristics associated with the face from the captured video signal. In some examples, the operations may also include generating an adjusted audio signal based at least in part on the audio feature characteristics and the facial feature characteristics and presenting the second preview content in the virtual avatar generation interface, at least in response to detecting expiration of the capture session.
  • In some instances, the audio signal may be further adjusted based at least in part on a type of the virtual avatar. Additionally, the type of the virtual avatar may be received based at least in part on an avatar type selection affordance presented in the virtual avatar generation interface. In some instances, the type of the virtual avatar may include an animal type, and the adjusted audio signal may be generated based at least in part on a predetermined sound associated with the animal type. The use and timing of predetermined sounds may be based on audio features from the captured audio and/or facial features from the captured video. This predetermined sound may also be itself modified based on audio features from the captured audio and facial features from the captured video. In some examples, the one or more processors may be further configured to determine whether a portion of the audio signal corresponds to the face in the field of view. Additionally, in accordance with a determination that the portion of the audio signal corresponds to the face, the portion of the audio signal may be stored for use in generating the adjusted audio signal and/or in accordance with a determination that the portion of the audio signal does not correspond to the face, at least the portion of the audio signal may be discarded and not considered for modification and/or playback. Additionally, the audio feature characteristics may comprise features of a voice associated with the face in the field of view. In some examples, the one or more processors may be further configured to store facial feature metadata associated with the facial feature characteristics extracted from the video signal. In some examples, the one or more processors may be further configured to store audio feature metadata associated with the audio feature characteristics extracted from the audio signal. Further, the one or more processors may be further configured to generate adjusted facial metadata based at least in part on the facial feature characteristics and the audio feature characteristics, and the second preview of the virtual avatar may be generated according to the adjusted facial metadata and the adjusted audio signal.
  • In some embodiments, a computer-readable medium may be provided. The computer-readable medium may include computer-executable instructions that, when executed by one or more processors, cause the one or more processors to perform operations. The operations may include performing the following actions in response to detecting a request to generate an avatar video clip of a virtual avatar: capturing, via a camera of an electronic device, a video signal associated with a face in a field of view of the camera, capturing, via a microphone of the electronic device, an audio signal, extracting voice feature characteristics from the captured audio signal, and extracting facial feature characteristics associated with the face from the captured video signal. The operations may also include performing the following actions in response to detecting a request to preview the avatar video clip: generating an adjusted audio signal based at least in part on the facial feature characteristics and the voice feature characteristics, and displaying a preview of the video clip of the virtual avatar using the adjusted audio signal.
  • In some embodiments, the audio signal may be adjusted based at least in part on a facial expression identified in the facial feature characteristics associated with the face. In some instances, the audio signal may be adjusted based at least in part on a level, pitch, duration, format, or change in a voice characteristic associated with the face. Further, in some embodiments, the one or more processors may be further configured to perform the operations comprising transmitting the video clip of the virtual avatar to another electronic device.
  • The following detailed description together with the accompanying drawings will provide a better understanding of the nature and advantages of the present disclosure.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 is a simplified block diagram illustrating example flow for providing audio and/or video effects techniques as described herein, according to at least one example.
  • FIG. 2 is another simplified block diagram illustrating example flow for providing audio and/or video effects techniques as described herein, according to at least one example.
  • FIG. 3 is another simplified block diagram illustrating hardware and software components for providing audio and/or video effects techniques as described herein, according to at least one example.
  • FIG. 4 is a flow diagram to illustrate providing audio and/or video effects techniques as described herein, according to at least one example.
  • FIG. 5 is another flow diagram to illustrate providing audio and/or video effects techniques as described herein, according to at least one example.
  • FIG. 6 is a simplified block diagram illustrating a user interface for providing audio and/or video effects techniques as described herein, according to at least one example.
  • FIG. 7 is another flow diagram to illustrate providing audio and/or video effects techniques as described herein, according to at least one example.
  • FIG. 8 is another flow diagram to illustrate providing audio and/or video effects techniques as described herein, according to at least one example.
  • FIG. 9 is a simplified block diagram illustrating is a computer architecture for providing audio and/or video effects techniques as described herein, according to at least one example.
  • DETAILED DESCRIPTION
  • Certain embodiments of the present disclosure relate to devices, computer-readable medium, and methods for implementing various techniques for providing voice effects (e.g., revised audio) based at least in part on facial expressions. Additionally, in some cases, the various techniques may also provide video effects based at least in part on audio characteristics of a recording. Even further, the various techniques may also provide voice effects and video effects (e.g., together) based at least in part on one or both of facial expressions and audio characteristics of a recording. In some examples, the voice effects and/or video effects may be presented in a user interface (UI) configured to display a cartoon representation of a user (e.g., an avatar or digital puppet). Such an avatar that represents a user may be considered an animoji, as it may look like an emoji character familiar to most smart phone users; however, it can be animated to mimic actual motions of the user.
  • For example, a user of a computing device may be presented with a UI for generating an animoji video (e.g., a video clip). The video clip can be limited to a predetermined amount of time (e.g., 10 second, 30 seconds, or the like), or the video clip can be unlimited. In the UI, a preview area may present the user with a real-time representation of their face, using an avatar character. Various avatar characters may be provided, and a user may even be able to generate or import their own avatars. The preview area may be configured to provide an initial preview of the avatar and a preview of the recorded video clip. Additionally, the recorded video clip may be previewed in its original form (e.g., without any video or audio effects) or it may be previewed with audio and/or video effects. In some cases, the user may select an avatar after the initial video clip has been recorded. The video clip preview may then change from one avatar to another, with the same or different video effects applied to it, as appropriate. For example, if the raw preview (e.g., original form, without effects) is being viewed, and the user switches avatar characters, the UI may be updated to display a rendering of the same video clip but with the newly selected avatar. In other words, the facial features and audio (e.g., the user's voice) that was captured during the recording can be presented from any of the avatars (e.g., without any effects). In the preview, it will appear as if the avatar character is moving the same way the user moved during the recording, and speaking what the user said during the recording.
  • By way of example, a user may select a first avatar (e.g., a unicorn head) via the UI, or a default avatar can be initially provided. The UI will present the avatar (in this example, the head of a cartoon unicorn if selected by the user or any other available puppet by default) in the preview area, and the device will begin capturing audio and/or video information (e.g., using one or more microphones and/or one or more cameras). In some cases, only video information is needed for the initial preview screen. The video information can be analyzed, and facial features can be extracted. These extracted facial features can then be mapped to the unicorn face in real-time, such that the initial preview of the unicorn head appears to mirror that of the user's. In some cases, the term real-time is used to indicate that the results of the extraction, mapping, rendering, and presentation are performed in response to each motion of the user and can be presented substantially immediately. To the user, it will appear as if they are looking in the mirror, except the image of their face is replaced with an avatar.
  • While the user's face is in the line of sight (e.g., the view) of a camera of the device, the UI will continue to present the initial preview. Upon selection of a record affordance (e.g., a virtual button) on the UI, the device may begin to capture video that has an audio component. In some examples, this includes a camera capturing frames and a microphone capturing audio information. A special camera may be utilized that is capable of capturing 3-dimensional (3D) information as well. Additionally, in some examples, any camera may be utilized that is capable of capturing video. The video may be stored in its original form and/or metadata associated with the video may be stored. As such, capturing the video and/or audio information may be different from storing the information. For example, capturing the information may include sensing the information and at least caching it such that is available for processing. The processed data can also be cached until it is determined whether to store or simply utilize the data. For example, during the initial preview, while the user's face is being presented as a puppet in real-time, the video data (e.g., metadata associated with the data) may be cached, while it is mapped to the puppet and presented. However, this data may not be stored permanently at all, such that the initial preview is not reusable or recoverable.
  • Alternatively, in some examples, once the user selects the record affordance of the UI, the video data and the audio data may be stored more permanently. In this way, the audio and video (A/V) data may analyzed, processed, etc., in order to provide the audio and video effects described herein. In some examples, the video data may be processed to extract facial features (e.g., facial feature characteristics) and those facial features may be stored as metadata for the animoji video clip. The set of metadata may be stored with an identifier (ID) that indicates the time, date, and user associated with the video clip. Additionally, the audio data may be stored with the same or other ID. Once stored, or in some examples—prior to storage, the system (e.g., processors of the device) may extract audio feature characteristics from the audio data and facial feature characteristics from the video file. This information can be utilized to identify context, key words, intent, and/or emotions of the user, and video and audio effects can be introduced into audio and video data prior to rendering the puppet. In some examples, the audio signal can be adjusted to include different words, sounds, tones, pitches, timing, etc., based at least in part on the extracted features. Additionally, in some examples, the video data (e.g., the metadata) can also be adjusted. In some examples, audio features are extracted in real-time during the preview itself. These audio features may be avatar specific, generated only if the associated avatar is being previewed. The audio features may be avatar agnostic, generated for all avatars. The audio signal can also be adjusted in part based on these real-time audio feature extractions, and with the pre-stored extracted video features which are created during or after the recording process, but before previewing.
  • Once the video and audio data have been adjusted based at least in part on the extracted characteristics, a second preview of the puppet can be rendered. This rendering may be performed for each possible puppet, such as the user scrolls through and selects different puppets, the adjusted data is already rendered. Or the rendering can be performed after selection of each puppet. In any event, once the user selects a puppet, the second preview can be presented. The second preview will replay the video clip that was recorded by the user, but with the adjusted audio and/or video. Using the example from above, if the user recorded themselves with an angry tone (e.g., with a gruff voice and a furrowed brow), the context or intent of anger may be detected, and the audio file may be adjusted to include a growling sound. Thus, the second preview would look like a unicorn saying the words that the user said; however, the voice of the user may be adjusted to sound like a growl, or to make the tone more baritone (e.g., lower). The user could then save the second preview or select it for transmission to another user (e.g., through a messaging application or the like). In some examples, the below and above animoji video clips can be shared as .mov files. However, in other examples, the described techniques can be used in real-time (e.g., with video messaging or the like).
  • FIG. 1 is a simplified block diagram illustrating example flow 100 for providing audio and/or video effects based at least in part on audio and/or video features detected in a user's recording. In example flow 100, there are two separate sessions: recording session 102 and playback session 104. In recording session 102, device 106 may capture video having an audio component of user 108 at block 110. In some examples, the video and audio may be captured (e.g., collected) separately, using two different devices (e.g., a microphone and a camera). The capturing of video and audio may be triggered based at least in part on selection of a record affordance by user 108. In some examples, user 108 may say the word “hello” at block 112. Additionally, at block 112, device 106 may continue to capture the video and/or audio components of the user's actions. At block 114, device 106 can continue capturing the video and audio components, and in this example, user 108 may say the word “bark.” At block 114, device 106 may also extract spoken words from the audio information. However, in other examples, the spoken word extraction (or any audio feature extraction) may actually take place after recording session 102 is complete. In other examples, the spoken word extraction (or any audio feature extraction) may actually take place during the preview block 124 in real-time. It is also possible for the extraction (e.g., analysis of the audio) to be done in real-time while recording session 102 is still in process. In either case, the avatar process being executed by device 106 may identify through the extraction that the user said the word “bark” and may employ some logic to determine what audio effects to implement.
  • By way of example, recording session 102 may end when user 108 selects the record affordance again (e.g., indicating a desire to end the recording), selects an end recording affordance (e.g., the record affordance may act as an end recording affordance while recording), or based at least in part on expiration of a time period (e.g., 10 seconds, 30 seconds, or the like). In some cases, this time period may be automatically predetermined, while in others, it may be user selected (e.g., selected from a list of options or entered in free form through a text entry interface). Once the recording has completed, user 108 may select a preview affordance, indicating that user 108 wishes to watch a preview of the recording. One option could be to play the original recording without any visual or audio effects. However, another option could be to play a revised version of the video clip. Based at least in part on detection of the spoken word “bark,” the avatar process may have revised the audio and/or video of the video clip.
  • At block 116, device 106 may present avatar (also called a puppet and/or animoji) 118 on a screen. Device 106 may also be configured with speaker 120 that can play audio associated with the video clip. In this example, block 116 corresponds to the same point in time as block 110, where user 108 may have had his mouth open, but was not yet speaking. As such, avatar 118 may be presented with his mouth open; however, no audio is presented from speaker 120 yet. At block 122, corresponding to block 112 where user 108 said “hello,” the avatar process can present avatar 118 with an avatar-specific voice. In other words, a predefined dog voice may be used to say the word “hello” at block 122. The dog-voice word “hello” can be presented by speaker 120. As will be described in further detail below, there are a variety of different animal (and other character) avatars available for selection by user 108. In some examples, each avatar may be associated with a particular pre-defined voice that best fits that avatar. For example, a dog may have a dog voice, a cat may have a cat voice, a pig may have a pig voice, and a robot may have a robotic voice. These avatar-specific voices may be pre-recorded or may be associated with particular frequency or audio transformations, that can happen by executing mathematical operations on the original sound, such that any user's voice can be transformed to sound like the dog voice. However, each user's dog voice may sound different based at least in part on the particular audio transformation performed.
  • At block 124, the avatar process may replace the spoken word (e.g., “bark”) with an avatar-specific word. In this example, the sound of a dog bark (e.g., a recorded or simulated dog bark) may be inserted into the audio data (e.g., in place of the word “bark”) such that when it is played back during presentation of the video clip, a “woof” is presented by speaker 120. In some examples, different avatar-specific words will be presented at 124 based at least in part on different avatar selections, and in other examples, the same avatar-specific word may be presented regardless of the avatar selections. For example, if user 108 said “bark,” a “woof” could be presented when the dog avatar is selected. However, in this same case, if user 108 later selected the cat avatar for the same flow, there are a couple of options for revising the audio. In one example, the process could convert the “bark” into a “woof” even though it wouldn't be appropriate for a cat to “woof.” In a different example, the process could convert “bark” into a recorded or simulated “meow,” based at least in part on the selection of the cat avatar. And, in yet another example, the process could ignore the “bark” for avatars other than the dog avatar. As such, there may be a second level of audio feature analysis performed even after the extraction at 114. Video and audio features may also influence processing on the avatar specific utterances. For example, the level and pitch and intonation with which a user says “bark” may be detected as part of the audio feature extraction, and this may direct the system to select a specific “woof” sample or transform such a sample before and/or during the preview process.
  • FIG. 2 is another simplified block diagram illustrating example flow 200 for providing audio and/or video effects based at least in part on audio and/or video features detected in a user's recording. In example flow 200, much like in example flow 100 of FIG. 1, there are two separate sessions: recording session 202 and playback session 204. In recording session 202, device 206 may capture video having an audio component of user 208 at block 210. The capturing of video and audio may be triggered based at least in part on selection of a record affordance by user 208. In some examples, user 208 may say the word “hello” at block 212. Additionally, at block 212, device 206 may continue to capture the video and/or audio components of the user's actions. At block 214, device 206 can continue capturing the video and audio components, and in this example, user 208 may hold his mouth open, but not say anything. At block 214, device 206 may also extract facial expressions from the video. However, in other examples, the facial feature extraction (or any video feature extraction) may actually take place after recording session 202 is complete. Still, it is possible for the extraction (e.g., analysis of the video) to be done in real-time while recording session 202 is still in process. In either case, the avatar process being executed by device 206 may identify through the extraction that the user opened his mouth briefly (e.g., without saying anything) and may employ some logic to determine what audio and/or video effects to implement. In some examples, the determination that the user held their mouth open without saying anything may require extraction and analysis of both audio and video. For example, extraction of the facial feature characteristics (e.g., open mouth) may not be enough, and the process may also need to detect that user 208 did not say anything during the same time period of the recording. Video and audio features may also influence processing on the avatar specific utterances. For example, the duration of the opening of the mouth, opening of eyes, etc. may direct the system to select a specific “woof” sample or transform such a sample before and/or during the preview process. One such transformation is changing the level and/or duration of the woof to match the detected opening and closing of the user's mouth.
  • By way of example, recording session 202 may end when user 208 selects the record affordance again (e.g., indicating a desire to end the recording), selects an end recording affordance (e.g., the record affordance may act as an end recording affordance while recording), or based at least in part on expiration of a time period (e.g., 20 seconds, 30 seconds, or the like). Once the recording has finished, user 208 may select a preview affordance, indicating that user 208 wishes to watch a preview of the recording. One option could be to play the original recording without any visual or audio effects. However, another option could be to play a revised version of the recording. Based at least in part on detection of the facial expression (e.g., the open mouth), the avatar process may have revised the audio and/or video of the video clip.
  • At block 216, device 206 may present avatar (also called a puppet and/or animoji) 218 on a screen of device 206. Device 206 may also be configured with speaker 220 that can play audio associated with the video clip. In this example, block 216 corresponds to the same point in time as block 210, where user 208 may not have been speaking yet. As such, avatar 218 may be presented with his mouth open; however, no audio is presented from speaker 220 yet. At block 222, corresponding to block 212 where user 208 said “hello,” the avatar process can present avatar 218 with an avatar-specific voice (as described above).
  • At block 224, the avatar process may replace the silence identified at block 214 with an avatar-specific word. In this example, the sound of a dog bark (e.g., a recorded or simulated dog bark) may be inserted into the audio data (e.g., in place of the silence) such that when it is played back during presentation of the video clip, a “woof” is presented by speaker 220. In some examples, different avatar-specific words will be presented at 224 based at least in part on different avatar selections, and in other examples, the same avatar-specific word may be presented regardless of the avatar selections. For example, if user 208 held his mouth open, a “woof” could be presented when the dog avatar is selected, a “meow” sound could be presented for a cat avatar, etc. In some cases, each avatar may have a predefined sound to be played when it is detected that user 208 has held his mouth open for an amount of time (e.g., a half second, a whole second, etc.) without speaking. However, in some examples, the process could ignore the detection of the open mouth for avatars that don't have a predefined effect for that facial feature. Additionally, there may be a second level of audio feature analysis performed even after the extraction at 214. For example, if the process determines that a “woof” is to be inserted for a dog avatar (e.g., based on detection of the open mouth), the process may also detect how many “woof” sounds to insert (e.g., if the user held his mouth open for double the length of time used to indicate a bark) or whether it's not possible to insert the number of barks requested (e.g., in the scenario of FIG. 1, where the user would speak “bark” to indicate a “woof” sound should be inserted. Thus, based on the above two examples, it should be evident, that user 208 can control effects of the playback (e.g., the recorded avatar message) with their facial and voice expressions. Further, while not shown explicitly in either FIG. 1 or FIG. 2, the user device can be configured with software for executing the avatar process (e.g., capturing the A/V information, extracting features, analyzing the data, implementing the logic, revising the audio and/or video files, and rendering the previews) as well as software for executing an application (e.g., an avatar application with its own UI) that enables the user to build the avatar messages and subsequently send them to other user devices.
  • FIG. 3 is a simplified block diagram 300 illustrating components (e.g., software modules) utilized by the avatar process described above and below. In some examples, more or less modules can be utilized to implement the providing of audio and/or video effects based at least in part on audio and/or video features detected in a user's recording. In some examples, device 302 may be configured with camera 304, microphone 306, and a display screen for presenting a UI and the avatar previews (e.g., the initial preview before recording as well as the preview of the recording before sending). In some examples, the avatar process is configured with avatar engine 308 and voice engine 310. Avatar engine 308 can manage the list of avatars, process the video features (e.g., facial feature characteristics), revise the video information, communicate with voice engine 301 when appropriate, and render video of the avatar 312 when all processing is complete and effects have been implemented (or discarded). Revising of the video information can include adjusting or otherwise editing the metadata associated with the video file. In this way, when the video metadata (adjusted or not) is used to render the puppet, the facial features can be mapped to the puppet. In some examples, voice engine 310 can store the audio information, perform the logic for determining what effects to implement, revise the audio information, and provide modified audio 314 when all processing is complete and effects have been implemented (or discarded).
  • In some examples, once the user selects to record a new avatar video clip, video features 316 can be captured by camera 304 and audio features 318 can be captured by microphone 306. In some cases there may be as many as (or more than) fifty facial features to be detected within video features 316. Example video features include, but are not limited to, duration of expressions, open mouth, frowns, smiles, eyebrows up or furrowed, etc. Additionally, video features 316 may include only metadata that identifies each of the facial features (e.g., data points that indicate which locations on the user's face moved or where in what position). Further, video features 316 can be passed to avatar engine 308 and voice engine 310. At avatar engine 308, the metadata associated with video features 316 can be stored and analyzed. In some examples, avatar engine 308 may perform the feature extraction from the video file prior to storing the metadata. However, in other examples, the feature extraction may be performed prior to video features 316 being sent to avatar engine (in which case, video features 316 would be the metadata itself). At voice engine 310, video features 316 may be compared with audio features 318 when it is helpful to match up what audio features correspond to which video features (e.g., to see if certain audio and video features occur at the same time).
  • In some instances, audio features are also passed to voice engine 310 for storage. Example audio features include, but are not limited to, level, pitch, dynamics (e.g., changes in level, pitching, voicing, formants, duration, etc.). Raw audio 320 includes the unprocessed audio file as it's captured. Raw audio 320 can be passed to voice engine 310 for further processing and potential (e.g., eventual) revision and it can also be stored separately so that the original audio can be used if desired. Raw audio 320 can also be passed to voice recognition module 322. Voice recognition module 322 can be used to word spot and identify a user's intent from their voice. For example, voice recognition module 322 can determine when a user is angry, sad, happy, or the like. Additionally, when a user says a key word (e.g., “bark” as described above), voice recognition module 322 will detect this. Information detected and/or collected by voice recognition module 322 can then be passed to voice engine 310 for further logic and/or processing. As noted, in some examples, audio features are extracted in real-time during the preview itself. These audio features may be avatar specific, generated only if the associated avatar is being previewed. The audio features may be avatar agnostic, generated for all avatars. The audio signal can also be adjusted in part based on these real-time audio feature extractions, and with the pre-stored extracted video features which are created during or after the recording process, but before previewing. Additionally, some feature extraction may be performed during rendering at 336 by voice engine 310. Some pre-stored sounds 338 may be used by voice engine 310, as appropriate, to fill in the blanks or to replace other sounds that were extracted.
  • In some examples, voice engine 310 will make the determination regarding what to do with the information extracted from voice recognition module 322. In some examples, voice engine 310 can pass the information from voice recognition module 322 to feature module 324 for determining which features correspond to the data extracted by voice recognition module 322. For example, feature module 324 may indicate (e.g., based on a set of rules and/or logic) that a sad voice detected by voice recognition module 322 corresponds to a raising of the pitch of the voice, or the slowing down of the speed or cadence of the voice. In other words, feature module 322 can map the extracted audio features to particular voice features. Then, effect type module 326 can map the particular voice features to the desired effect. Voice engine 310 can also be responsible for storing each particular voice for each possible avatar. For example, there may be standard or hardcoded voices for each avatar. Without any other changes being made, if a user selects a particular avatar, voice engine 310 can select the appropriate standard voice for use with playback. In this case, modified audio 314 may just be raw audio 320 transformed to the appropriate avatar voice based on the selected avatar. As the user scrolls through the avatars and selects different ones, voice engine 310 can modify raw audio 320 on the fly to make it sound like the newly selected avatar. Thus, avatar type 328 needs to be provided to voice engine 310 to make this change. However, if an effect is to be provided (e.g., the pitch, tone, or actual words are to be changed within the audio file), voice engine 310 can revise raw audio file 320 and provide modified audio 314. In some examples, the user will be provided with an option to use the original audio file at on/off 330. If the user selects “off” (e.g., effects off), then raw audio 320 can be combined with video of avatar 312 (e.g., corresponding to the unchanged video) to make A/V output 332. A/V output 332 can be provided to the avatar application presented on the UI of device 302.
  • Avatar engine 308 can be responsible for providing the initial avatar image based at least in part on the selection of avatar type 328. Additionally, avatar engine 308 is responsible for mapping video features 316 to the appropriate facial markers of each avatar. For example, if video features 316 indicate that the user is smiling, the metadata that indicates a smile can be mapped to the mouth area of the selected avatar so that the avatar appears to be smiling in video of avatar 312. Additionally, avatar engine 308 can receive timing changes 334 from voice engine, as appropriate. For example, if voice engine 310 determines that voice effect is to make the audio be more of a whispering voice (e.g., based on feature module 324 and/or effect type 326 and or the avatar type), and modifies the voice to be more of a whispered voice, this effect change may include slowing down the voice itself, in addition to a reduced level and other formant and pitch changes. Accordingly, the voice engine may produce a modified audio which is slower in playback speed relative to the original audio file for the audio clip. In this scenario, voice engine 310 would need to instruct avatar engine 308 via timing changes 334, so that the video file can be slowed down appropriately; otherwise, the video and audio would not be synchronized.
  • As noted, a user may use the avatar application of device 302 to select different avatars. In some examples, the voice effect can change based at least in part on this selection. However, in other examples, the user may be given the opportunity to select a different voice for a given avatar (e.g., the cat voice for the dog avatar, etc.). This type of free-form voice effect change can be executed by the user via selection on the UI or, in some cases, with voice activation or face motion. For example, a certain facial expression could trigger voice engine 310 to change the voice effect for a given avatar. Further, in some examples, voice engine 310 may be configured to make children's voices sound more high pitched or, alternatively, determine not to make a child's voice more high pitched because it would sound inappropriate given that raw audio 320 for a child's voice might already be high pitched. Making this user specific determination of an effect could be driven in part by the audio features extracted, and in this case such features could include pitch values and ranges throughout the recording.
  • In some examples, voice recognition module 322 may include a recognition engine, a word spotter, a pitch analyzer, and/or a formant analyzer. The analysis performed by voice recognition module 322 will be able to identify if the user if upset, angry, happy, etc. Additionally, voice recognition module 322 may be able to identify context and/or intonation of the user's voice, as well as change the intention of wording and/or determine a profile (e.g., a virtual identity) of the user.
  • In some examples, the avatar process 300 can be configured to package/render the video clip by combining video of avatar 312 and either modified audio 314 or raw audio 320 into A/V output 332. In order to package the two, voice engine 310 just needs to know an ID for the metadata associated with video of avatar 312 (e.g., it does not actually need video of avatar 312, it just needs the ID of the metadata). A message within a messaging application (e.g., the avatar application) can be transmitted to other computing devices, where the message includes A/V output 332. When a user selects a “send” affordance in the UI, the last video clip to be previewed can be sent. For example, if a user previews their video clip with the dog avatar, and then switches to the cat avatar for preview, the cat avatar video would be sent when the user selects “send.” Additionally, the state of the last preview can be stored and used later. For example, if the last message (e.g., avatar video clip) sent used a particular effect, the first preview of the next message being generated can utilize that particular effect.
  • The logic implemented by voice engine 310 and/or avatar engine 308 can check for certain cues and/or features, and then revise the audio and/or video files to implement the desired effect. Some example feature/effect pairs include: detecting that user has opened their mouth and paused for a moment. In this example, both facial feature characteristics (e.g., mouth open) and audio feature characteristics (e.g., silence) need to happen at the same time in order for the desired effect to be implemented. For this feature/effect pair, the desired effect to revise the audio and video so that the avatar appears to make an avatar/animal-specific sound. For example, a dog will make a bark sound, a cat will make a meow sound, a monkey, horse, unicorn, etc., will make the appropriate sound for that character/animal. Other example feature/effect pairs include lower the audio pitch and/or tone when a frown is detected. In this example, only the video feature characteristics need to be detected. However, in some examples, this effect could be implemented based at least in part on voice recognition module 322 detecting sadness in the voice of the user. In this case, video features 316 wouldn't be needed at all. Other example feature/effect pairs include whispering to cause the audio and video speeds to be slowed, toned down, and/or a reduction in changes. In some cases, video changes can lead to modifications of the audio while, in other case, audio changes can lead to modifications of the video.
  • As noted above, in some examples, avatar engine 308 may act as the feature extractor, in which case video features 316 and audio features 318 may not exist prior to being sent to avatar engine 308. Instead, raw audio 320 and metadata associated with the raw video may be passed into avatar engine 308, where avatar engine 308 may extract the audio feature characteristics and the video (e.g., facial) feature characteristics. In other words, while not drawn this way in FIG. 3, parts of avatar engine 308 may actually exist within camera 304. Additionally, in some examples, metadata associated with video features 316 can be stored in a secure container, and when voice engine 310 is running, it can read the metadata from the container.
  • In some instances, because the preview video clip of the avatar is not displayed in real-time (e.g., it is rendered and displayed after the video is recorded and sometimes only in response to selection of a play affordance), the audio and video information can be processed offline (e.g., not in real-time). As such, avatar engine 308 and voice engine 310 can read ahead in the audio and video information and make context decisions up front. Then, voice engine 310 can revise the audio file accordingly. This ability to read ahead and make decisions offline will greatly increase the efficiency of the system, especially for longer recordings. Additionally, this enables a second stage of analysis, where additional logic can be processed. Thus, the entire audio file can be analyzed before making any final decisions. For example, if the user says “bark” two times in a row, but the words “bark” were said too closely together, the actual “woof” sound that was prerecorded might not be able to fit in the time it took the user to say “bark, bark.” In this case, voice engine 310 can take the information from voice recognition 322 and determine to ignore the second “bark,” because it won't be possible to include both “woof” sounds in the audio file.
  • As noted above, when the audio file and the video are packaged together to make A/V output 332, voice engine does not actually need to access video of avatar 312. Instead, the video file (e.g., a .mov format file, or the like) is created as the video is being played by accessing an array of features (e.g., floating-point values) that were written to the metadata file. However, all permutations/adjustments to the audio and video files can be done in advance, and some can even be done in real-time as the audio and video are extracted. Additionally, in some examples, each modified video clip could be saved temporarily (e.g., cached), such that if the user reselects an avatar that's already been previewed, the processing to generate/render that particular preview does not need to be duplicated. As opposed to re-rendering the revised video clip each time the same avatar is selected during the preview section, the above noted caching of rendered video clips would enable the realization of large savings in processor power and instructions per second (IPS), especially for longer recordings and/or recordings with a large number of effects.
  • Additionally, in some examples, noise suppression algorithms can be employed for handling cases where the sound captured by microphone 306 includes sounds other than the user's voice. For example, when the user is in a windy area, or a loud room (e.g., a restaurant or bar). In these examples, a noise suppression algorithm could lower the decibel output of certain parts of the audio recording. Alternatively, or in addition, different voices could be separated and/or only audio coming from certain angles of view (e.g., the angle of the user's face) could be collected, and other voices could be ignored or suppressed. In other cases, if the avatar process 300 determines that the noise levels are too loud or will be difficult to process, the process 300 could disable the recording option.
  • FIG. 4 illustrates an example flow diagram showing process 400 for implementing various audio and/or video effects based at least in part on audio and/or video features, according to at least a few embodiments. In some examples, computing device 106 of FIG. 1 or other similar user device (e.g., utilizing at least avatar process 300 of FIG. 3) may perform the process 400 of FIG. 4.
  • At block 402, computing device 106 may capture video having an audio component. In some examples, the video and audio may be captured by two different hardware components (e.g., a camera may capture the video information while a microphone may capture the audio information). However, in some instances, a single hardware component may be configured to capture both audio and video. In any event, the video and audio information may be associated with one another (e.g., by sharing an ID, timestamp, or the like). As such, the video may have an audio component (e.g., they are part of the same file), or the video may be linked with an audio component (e.g., two files that are associated together).
  • At block 404, computing device 106 may extract facial features and audio features from the captured video and audio information, respectively. In some cases, the facial feature information may be extracted via avatar engine 308 and stored as metadata. The metadata can be used to map each facial feature to a particular puppet or to any animation or virtual face. Thus, the actual video file does not need to be stored, creating memory storage efficiency and significant savings. Regarding the audio feature extraction, a voice recognition algorithm can be utilized to extract different voice features; for example, words, phrases, pitch, speed, etc.
  • At block 406, computing device 106 may detect context from the extracted features. For example, context may include a user's intent, mood, setting, location, background items, ideas, etc. The context can be important when employing logic to determine what effects to apply. In some cases, the context can be combined with detected spoken words to determine whether and/or how to adjust the audio file and/or the video file. In one example, a user may furrow his eyebrows and speak slowly. The furrowing of the eyebrows is a video feature that could have been extracted at block 404 and the slow speech is an audio feature that could have been extracted at block 404. Individually, those two features might mean something different; however, when combined together, the avatar process can determine that the user is concerned about something. In this case, the context of the message might be that a parent is speaking to a child, or a friend is speaking to another friend about a serious or concerning matter.
  • At block 408, computing device 106 may determine effects for rendering the audio and/or video files based at least in part on the context. As noted above, one context might be concern. As such, a particular video and/or audio feature may be employed for this effect. For example, the voice file may be adjusted to sound more somber, or to be slowed down. In other examples, the avatar-specific voice might be replaced with a version of the original (e.g., raw) audio to convey the seriousness of the message. Various other effects can be employed for various other contexts. In other examples, the context may be animal noises (e.g., based on the user saying “bark” or “meow” or the like. In this case, the determined effect would be to replace the spoken word “bark” with the sound of a dog barking.
  • At block 410, computing device 106 may perform additional logic for additional effects. For example, if the user attempted to effectuate the bark effect by saying bark twice in a row, the additional logic may need to be utilized to determine whether the additional bark is technically feasible. As an example, if the audio clip of the bark that is used to replace the spoken word in the raw audio information is 0.5 seconds long, but the user says “bark” twice in a 0.7-second span, the additional logic can determine that two bark sounds cannot fit in the 0.7 seconds available. Thus, the audio and video file may need to be extended in order to fit both bark sounds, the bark sound may need to be shortened (e.g., by processing the stored bark sound), or the second spoken word bark may need to be ignored.
  • At block 412, computing device 106 may revise the audio and/or video information based at least in part on the determined effects and/or additional effects. In some examples, only one set of effects may be used. However, in either case, the raw audio file may be adjusted (e.g., revised) to form a new audio file with additional sounds added and/or subtracted. For example, in the “bark” use case, the spoken word “bark” will be removed from the audio file and a new sound that represents an actual dog barking will be inserted. The new file can be saved with a different ID, or with an appended ID (e.g., the raw audio ID, with a .v2 identifier to indicate that it is not the original). Additionally, the raw audio file will be saved separately so that it can be reused for additional avatars and/or if the user decides not to use the determined effects.
  • At block 414, computing device 106 may receive a selection of an avatar from the user. The user may select one of a plurality of different avatars through a UI of the avatar application being executed by computing device 106. The avatars may be selected via a scroll wheel, drop down menu, or icon menu (e.g., where each avatar is visible on the screen in its own position).
  • At block 416, computing device 106 may present the revised video with the revised audio based at least in part on the selected avatar. In this example, each adjusted video clip (e.g., a final clip for the avatar that has adjusted audio and/or adjust video) may be generated for each respective avatar prior to selection of the avatar by the user. This way, the processing has already been completed, and the adjusted video clip is ready to be presented immediately upon selection of the avatar. While this might require additional IPS prior to avatar selection, it will speed up the presentation. Additionally, the processing of each adjusted video clip can be performed while the user is reviewing the first preview (e.g., the preview that corresponds to the first/default avatar presented in the UI).
  • FIG. 5 illustrates an example flow diagram showing process 500 for implementing various audio and/or video effects based at least in part on audio and/or video features, according to at least a few embodiments. In some examples, computing device 106 of FIG. 1 or other similar user device (e.g., utilizing at least avatar process 300 of FIG. 3) may perform the process 500 of FIG. 5.
  • At block 502, computing device 106 may capture video having an audio component. Just like in block 402 of FIG. 4, the video and audio may be captured by two different hardware components (e.g., a camera may capture the video information while a microphone may capture the audio information). As noted, the video may have an audio component (e.g., they are part of the same file), or the video may be linked with an audio component (e.g., two files that are associated together).
  • At block 504, computing device 106 may extract facial features and audio features from the captured video and audio information, respectively. Just like above, the facial feature information may be extracted via avatar engine 308 and stored as metadata. The metadata can be used to map each facial feature to a particular puppet or to any animation or virtual face. Thus, the actual video file does not need to be stored, creating memory storage efficiency and significant savings. Regarding the audio feature extraction, a voice recognition algorithm can be utilized to extract different voice features; for example, words, phrases, pitch, speed, etc. Additionally, in some examples, avatar engine 308 and/or voice engine 310 may perform the audio feature extraction.
  • At block 506, computing device 106 may detect context from the extracted features. For example, context may include a user's intent, mood, setting, location, ideas, identity, etc. The context can be important when employing logic to determine what effects to apply. In some cases, the context can be combined with spoken words to determine whether and/or how to adjust the audio file and/or the video file. In one example, a user's age may be detected as the context (e.g., child, adult, etc.) based at least in part on facial and/or voice features. For example, a child's face may have particular features that can be identified (e.g., large eyes, a small nose, and a relatively small head, etc.). As such, a child context may be detected.
  • At block 508, computing device 106 may receive a selection of an avatar from the user. The user may select one of a plurality of different avatars through a UI of the avatar application being executed by computing device 106. The avatars may be selected via a scroll wheel, drop down menu, or icon menu (e.g., where each avatar is visible on the screen in its own position).
  • At block 510, computing device 106 may determine effects for rendering the audio and/or video files based at least in part on the context and the selected avatar. In this example, the effects for each avatar may be generated upon selection of each avatar, as opposed to all at once. In some instances, this will enable realization of significant processor and memory savings, because only one set of effects and avatar rendering will be performed at a time. These savings can be realized especially when the user does not select multiple avatars to preview.
  • At block 512, computing device 106 may perform additional logic for additional effects, similar to that described above with respect to block 410 of FIG. 4. At block 514, computing device 106 may revise the audio and/or video information based at least in part on the determined effects and/or additional effects for the selected avatar, similar to that described above with respect to block 412 of FIG. 4. At block 516, computing device 106 may present the revised video with the revised audio based at least in part on the selected avatar, similar that described above with respect to block 416 of FIG. 4.
  • In some examples, the avatar process 300 may determine whether to perform flow 400 or flow 500 based at least in part on historical information. For example, if the user generally uses the same avatar every time, flow 500 will be more efficient. However, if the user regularly switches between avatars, and previews multiple different avatars per video clip, then following flow 400 may be more efficient.
  • FIG. 6 illustrates an example UI 600 for enabling a user to utilize the avatar application (e.g., corresponding to avatar application affordance 602). In some examples, UI 600 may look different (e.g., it may appear as a standard text (e.g., short messaging service (SMS)) messaging application) until avatar application affordance 602 is selected. As noted, the avatar application can communicate with the avatar process (e.g., avatar process 300 of FIG. 3) to make requests for capturing, processing (e.g., extracting features, running logic, etc.), and adjusting audio and/video. For example, when the user selects a record affordance (e.g., record/send video clip affordance 604), the avatar application may make an application programming interface (API) call to the avatar process to begin capturing video and audio information using the appropriate hardware components. In some example, record/send video clip affordance 604 may be represented as a red circle (or a plain circle without the line shown in FIG. 6) prior to the recording session beginning. In this way, the affordance will look more like a standard record button. During the recording the session, the appearance of record/send video clip affordance 604 may be changed to look like a clock countdown or other representation of a timer (e.g., if the length of video clip recordings is limited). However, in other examples, the record/send video clip affordance 604 may merely change colors to indicate that the avatar application is recording. If there is no timer, or limit on the length of the recording, the user may need to select record/send video clip affordance 604 again to terminate the recording.
  • In some examples, a user may use avatar selection affordance 606 to select an avatar. This can be done before recording of the avatar video clip and/or after recording of the avatar video clip. When selected before recording, the initial preview of the user's motions and facial characteristics will be presented as the selected avatar. Additionally, the recording will be performed while presenting a live (e.g., real-time) preview of the recording, with the user's face being represented by the selected avatar. Once the recording is completed, a second preview (e.g., a replay of the actual recording) will be presented, again using the selected avatar. However, at this stage, the user can scroll through avatar selection affordance 606 to select a new avatar to view the recording preview. In some cases, upon selection of a new avatar, the UI will begin to preview the recording using the selected avatar. The new preview can be presented with the audio/video effects or as originally recorded. As noted, the determination regarding whether to present the effected version or the original may be based at least in part on the last method of playback used. For example, if the last playback used effects, the first playback after a new avatar selection may use effects. However, if the last playback did not use effects, the first playback after a new avatar selection may not use effects. In some examples, the use can replay the video clip with effects by selecting effects preview affordance 608 or without effects by selecting original preview affordance 610. Once satisfied with the video clip (e.g., the message), the user can send the avatar video in a message to another computing device using record/send video clip affordance 604. The video clip will be sent using the format corresponding to the last preview (e.g., with or without effects). At any time, if the user desires, delete video clip affordance 612 may be selected to delete the avatar video and either start over or exit the avatar and/or messaging applications.
  • FIG. 7 illustrates an example flow diagram showing process (e.g., a computer-implemented method) 700 for implementing various audio and/or video effects based at least in part on audio and/or video features, according to at least a few embodiments. In some examples, computing device 106 of FIG. 1 or other similar user device (e.g., utilizing at least an avatar application similar to that shown in FIG. 6 and avatar process 300 of FIG. 3) may perform the process 700 of FIG. 7.
  • At block 702, computing device 106 may display a virtual avatar generation interface. The virtual avatar generation interface may look similar to the UI illustrated in FIG. 6. However, any UI configured to enable the same features described herein can be used.
  • At block 704, computing device 106 may display first preview content of a virtual avatar. In some examples, the first preview content may be a real-time representation of the user's face, including movement and facial expressions. However, the first preview would provide an avatar (e.g., cartoon character, digital/virtual puppet) to represent the user's face instead of an image of the user's face. This first preview may be video only, or at least a rendering of the avatar without sound. In some examples, this first preview is not recorded and can be utilized for as long as the user desires, without limitation other than batter power or memory space of computing device 106.
  • At block 706, computing device 106 may detect selection of an input (e.g., record/send video clip affordance 604 of FIG. 6) in the virtual avatar generation interface. This selection may be made while the UI is displaying the first preview content.
  • At block 708, computing device 106 may begin capturing video and audio signals based at least in part on the input detected at block 706. As described, the video and audio signals may be captured by appropriate hardware components and can be captured by one or a combination of such components.
  • At block 710, computing device 106 may extract audio feature characteristics and facial feature characteristics as described in detail above. As noted, the extraction may be performed by particular modules of avatar process 300 of FIG. 3 or by other extraction and/or analysis components of the avatar application and/or computing device 106.
  • At block 712, computing device 106 may generate adjusted audio signal based at least in part on facial feature characterizes and audio feature characteristics. For example, the audio file captured at block 708 may be permanently (or temporarily) revised (e.g., adjusted) to include new sounds, new words, etc., and/or to have the original pitch, tone, volume, etc., adjusted. These adjustments can be made based at least in part on the context detected via analysis of the facial feature characterizes and audio feature characteristics. Additionally, the adjustments can be made based on the type of avatar selected and/or based on specific motions, facial expressions, words, phrases, or actions performed by the user (e.g., expressed by the user's face) during the recording session.
  • At block 714, computing device 106 may generate second preview content of the virtual avatar in the UI according to the adjusted audio signal. The generated second preview content may be based at least in part on the currently selected avatar or some default avatar. Once the second preview content is generated, computing device 106 can present the second preview content in the UI at block 716.
  • FIG. 8 illustrates an example flow diagram showing process (e.g., instructions stored on a computer-readable memory that can be executed) 800 for implementing various audio and/or video effects based at least in part on audio and/or video features, according to at least a few embodiments. In some examples, computing device 106 of FIG. 1 or other similar user device (e.g., utilizing at least an avatar application similar to that shown in FIG. 6 and avatar process 300 of FIG. 3) may perform the process 800 of FIG. 8.
  • At block 802, computing device 106 may detect a request to generate an avatar video clip of a virtual avatar. In some examples, the request may be based at least in part on a user's selection of send/record video clip affordance 604 of FIG. 6.
  • At block 804, computing device 106 may capture a video signal associated with a face in the field of view of the camera. At block 806, computing device 106 may capture an audio signal corresponding to the video signal (e.g., coming from the face being captured by the camera).
  • At block 808, computing device 106 may extract voice feature characteristics from the audio signal and at block 810, computing device 106 may extract facial feature characterizes from the video signal.
  • At block 812, computing device 106 may detect a request to preview the avatar video clip. This request may be based at least in part on a user's selection of a new avatar via avatar selection affordance 606 of FIG. 6 or based at least in part on a user's selection of effects preview affordance 608 of FIG. 6.
  • At block 814, computing device 106 may generate adjusted audio signal based at least in part on facial feature characterizes and voice feature characteristics. For example, the audio file captured at block 806 may be revised (e.g., adjusted) to include new sounds, new words, etc., and/or to have the original pitch, tone, volume, etc., adjusted. These adjustments can be made based at least in part on the context detected via analysis of the facial feature characterizes and voice feature characteristics. Additionally, the adjustments can be made based on the type of avatar selected and/or based on specific motions, facial expressions, words, phrases, or actions performed by the user (e.g., expressed by the user's face) during the recording session.
  • At block 816, computing device 106 may generate a preview of the virtual avatar in the UI according to the adjusted audio signal. The generated preview may be based at least in part on the currently selected avatar or some default avatar. Once the preview is generated, computing device 106 can also present the second preview content in the UI at block 816.
  • FIG. 9 is a simplified block diagram illustrating example architecture 900 for implementing the features described herein, according to at least one embodiment. In some examples, computing device 902 (e.g., computing device 106 of FIG. 1), having example architecture 900, may be configured to present relevant UIs, capture audio and video information, extract relevant data, perform logic, revise the audio and video information, and present animoji videos.
  • Computing device 902 may be configured to execute or otherwise manage applications or instructions for performing the described techniques such as, but not limited to, providing a user interface (e.g., user interface 600 of FIG. 6) for recording, previewing, and/or sending virtual avatar video clips. Computing device 602 may receive inputs (e.g., utilizing I/O device(s) 904 such as a touch screen) from a user at the user interface, capture information, process the information, and then present the video clips as previews also utilizing I/O device(s) 904 (e.g., a speaker of computing device 902). Computing device 902 may be configured to revise audio and/or video files based at least in part on facial features extracted from the captured video and/or voice features extracted from the captured audio.
  • Computing device 902 may be any type of computing device such as, but not limited to, a mobile phone (e.g., a smartphone), a tablet computer, a personal digital assistant (PDA), a laptop computer, a desktop computer, a thin-client device, a smart watch, a wireless headset, or the like.
  • In one illustrative configuration, computing device 902 may include at least one memory 914 and one or more processing units (or processor(s)) 916. Processor(s) 916 may be implemented as appropriate in hardware, computer-executable instructions, or combinations thereof. Computer-executable instruction or firmware implementations of processor(s) 916 may include computer-executable or machine-executable instructions written in any suitable programming language to perform the various functions described.
  • Memory 914 may store program instructions that are loadable and executable on processor(s) 916, as well as data generated during the execution of these programs. Depending on the configuration and type of computing device 902, memory 914 may be volatile (such as random access memory (RAM)) and/or non-volatile (such as read-only memory (ROM), flash memory, etc.). Computing device 902 may also include additional removable storage and/or non-removable storage 926 including, but not limited to, magnetic storage, optical disks, and/or tape storage. The disk drives and their associated non-transitory computer-readable media may provide non-volatile storage of computer-readable instructions, data structures, program modules, and other data for the computing devices. In some implementations, memory 914 may include multiple different types of memory, such as static random access memory (SRAM), dynamic random access memory (DRAM), or ROM. While the volatile memory described herein may be referred to as RAM, any volatile memory that would not maintain data stored therein once unplugged from a host and/or power would be appropriate.
  • Memory 914 and additional storage 926, both removable and non-removable, are all examples of non-transitory computer-readable storage media. For example, non-transitory computer readable storage media may include volatile or non-volatile, removable or non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules, or other data. Memory 914 and additional storage 926 are both examples of non-transitory computer storage media. Additional types of computer storage media that may be present in computing device 902 may include, but are not limited to, phase-change RAM (PRAM), SRAM, DRAM, RAM, ROM, electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, compact disc read-only memory (CD-ROM), digital video disc (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to store the desired information and that can be accessed by computing device 902. Combinations of any of the above should also be included within the scope of non-transitory computer-readable storage media.
  • Alternatively, computer-readable communication media may include computer-readable instructions, program modules, or other data transmitted within a data signal, such as a carrier wave, or other transmission. However, as used herein, computer-readable storage media does not include computer-readable communication media.
  • Computing device 902 may also contain communications connection(s) 928 that allow computing device 902 to communicate with a data store, another computing device or server, user terminals and/or other devices via one or more networks. Such networks may include any one or a combination of many different types of networks, such as cable networks, the Internet, wireless networks, cellular networks, satellite networks, other private and/or public networks, or any combination thereof. Computing device 902 may also include I/O device(s) 904, such as a touch input device, a keyboard, a mouse, a pen, a voice input device, a display, a speaker, a printer, etc.
  • Turning to the contents of memory 914 in more detail, memory 914 may include operating system 932 and/or one or more application programs or services for implementing the features disclosed herein including user interface module 934, avatar control module 936, avatar application module 938, and messaging module 940. Memory 914 may also be configured to store one or more audio and video files to be used to produce audio and video output. In this way, computing device 902 can perform all of the operations described herein.
  • In some examples, user interface module 934 may be configured to manage the user interface of computing device 902. For example, user interface module 934 may present any number of various UIs requested by computing device 902. In particular, user interface module 934 may be configured to present UI 600 of FIG. 6, which enables implementation of the features describe herein, including communication with avatar process 300 of FIG. 3 which is responsible for capturing video and audio information, extracting appropriate facial feature and voice feature information, and revising the video and audio information prior to presentation of the generated avatar video clips as described above.
  • In some examples, avatar control module 936 is configured to implement (e.g., execute instructions for implementing) avatar process 300 while avatar application module 938 is configured to implement the user facing application. As noted above, avatar application module 938 may utilize one or more APIs for requesting and/or providing information to avatar control module 936.
  • In some embodiments, messaging module 940 may implement any standalone or add-on messaging application that can communicate with avatar control module 936 and/or avatar application module 938. In some examples, messaging module 940 may be fully integrated with avatar application module 938 (e.g., as seen in UI 600 of FIG. 6), where the avatar application appears to be part of the messaging application. However, in other examples, messaging application 940 may call to avatar application module 938 when a user requests to generate an avatar video clip, and avatar application module 938 may open up a new application altogether that is in integrated with messaging module 940.
  • Computing device 902 may also be equipped with a camera and microphone, as shown in at least FIG. 3, and processors 916 may be configured to execute instructions to display a first preview of a virtual avatar. In some examples, while displaying the first preview of a virtual avatar, an input may be detected via a virtual avatar generation interface presented by user interface module 934. In some instances, in response to detecting the input in the virtual avatar generation interface, avatar control module 936 may initiate a capture session including: capturing, via the camera, a video signal associated with a face in a field of view of the camera, capturing, via the microphone, an audio signal associated with the captured video signal, extracting audio feature characteristics from the captured audio signal, and extracting facial feature characteristics associated with the face from the captured video signal. Additionally, in response to detecting expiration of the capture session, avatar control module 936 may generate an adjusted audio signal based at least in part on the audio feature characteristics and the facial feature characteristics, and display a second preview of the virtual avatar in the virtual avatar generation interface according to the facial feature characteristics and the adjusted audio signal.
  • Illustrative methods, computer-readable medium, and systems for providing various techniques for adjusting audio and/or video content based at least in part on voice and/or facial feature characteristics are described above. Some or all of these systems, media, and methods may, but need not, be implemented at least partially by architectures and flows such as those shown at least in FIGS. 1-9 above. While many of the embodiments are described above with reference to messaging applications, it should be understood that any of the above techniques can be used within any type of application including real-time video playback or real-time video messaging applications. For purposes of explanation, specific configurations and details are set forth in order to provide a thorough understanding of the examples. However, it should also be apparent to one skilled in the art that the examples may be practiced without the specific details. Furthermore, well-known features were sometimes omitted or simplified in order not to obscure the example being described.
  • The various embodiments further can be implemented in a wide variety of operating environments, which in some cases can include one or more user computers, computing devices or processing devices which can be used to operate any of a number of applications. User or client devices can include any of a number of general purpose personal computers, such as desktop or laptop computers running a standard operating system, as well as cellular, wireless and handheld devices running mobile software and capable of supporting a number of networking and messaging protocols. Such a system also can include a number of workstations running any of a variety of commercially-available operating systems and other known applications for purposes such as development and database management. These devices also can include other electronic devices, such as dummy terminals, thin-clients, gaming systems and other devices capable of communicating via a network.
  • Most embodiments utilize at least one network that would be familiar to those skilled in the art for supporting communications using any of a variety of commercially-available protocols, such as TCP/IP, OSI, FTP, UPnP, NFS, CIFS, and AppleTalk. The network can be, for example, a local area network, a wide-area network, a virtual private network, the Internet, an intranet, an extranet, a public switched telephone network, an infrared network, a wireless network, and any combination thereof.
  • In embodiments utilizing a network server, the network server can run any of a variety of server or mid-tier applications, including HTTP servers, FTP servers, CGI servers, data servers, Java servers, and business application servers. The server(s) also may be capable of executing programs or scripts in response requests from user devices, such as by executing one or more applications that may be implemented as one or more scripts or programs written in any programming language, such as Java®, C, C# or C++, or any scripting language, such as Perl, Python or TCL, as well as combinations thereof. The server(s) may also include database servers, including without limitation those commercially available from Oracle Microsoft®, Sybase®, and IBM®.
  • The environment can include a variety of data stores and other memory and storage media as discussed above. These can reside in a variety of locations, such as on a storage medium local to (and/or resident in) one or more of the computers or remote from any or all of the computers across the network. In a particular set of embodiments, the information may reside in a storage-area network (SAN) familiar to those skilled in the art. Similarly, any necessary files for performing the functions attributed to the computers, servers or other network devices may be stored locally and/or remotely, as appropriate. Where a system includes computerized devices, each such device can include hardware elements that may be electrically coupled via a bus, the elements including, for example, at least one central processing unit (CPU), at least one input device (e.g., a mouse, keyboard, controller, touch screen or keypad), and at least one output device (e.g., a display device, printer or speaker). Such a system may also include one or more storage devices, such as disk drives, optical storage devices, and solid-state storage devices such as RAM or ROM, as well as removable media devices, memory cards, flash cards, etc.
  • Such devices also can include a computer-readable storage media reader, a communications device (e.g., a modem, a network card (wireless or wired), an infrared communication device, etc.), and working memory as described above. The computer-readable storage media reader can be connected with, or configured to receive, a non-transitory computer-readable storage medium, representing remote, local, fixed, and/or removable storage devices as well as storage media for temporarily and/or more permanently containing, storing, transmitting, and retrieving computer-readable information. The system and various devices also typically will include a number of software applications, modules, services or other elements located within at least one working memory device, including an operating system and application programs, such as a client application or browser. It should be appreciated that alternate embodiments may have numerous variations from that described above. For example, customized hardware might also be used and/or particular elements might be implemented in hardware, software (including portable software, such as applets) or both. Further, connection to other computing devices such as network input/output devices may be employed.
  • Non-transitory storage media and computer-readable storage media for containing code, or portions of code, can include any appropriate media known or used in the art (except for transitory media like carrier waves or the like) such as, but not limited to, volatile and non-volatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules or other data, including RAM, ROM, Electrically Erasable Programmable Read-Only Memory (EEPROM), flash memory or other memory technology, CD-ROM, DVD or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices or any other medium which can be used to store the desired information and which can be accessed by a system device. Based on the disclosure and teachings provided herein, a person of ordinary skill in the art will appreciate other ways and/or methods to implement the various embodiments. However, as noted above, computer-readable storage media does not include transitory media such as carrier waves or the like.
  • The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense. It will, however, be evident that various modifications and changes may be made thereunto without departing from the broader spirit and scope of the disclosure as set forth in the claims.
  • Other variations are within the spirit of the present disclosure. Thus, while the disclosed techniques are susceptible to various modifications and alternative constructions, certain illustrated embodiments thereof are shown in the drawings and have been described above in detail. It should be understood, however, that there is no intention to limit the disclosure to the specific form or forms disclosed, but on the contrary, the intention is to cover all modifications, alternative constructions and equivalents falling within the spirit and scope of the disclosure, as defined in the appended claims.
  • The use of the terms “a,” “an,” and “the,” and similar referents in the context of describing the disclosed embodiments (especially in the context of the following claims), are to be construed to cover both the singular and the plural, unless otherwise indicated herein or clearly contradicted by context. The terms “comprising,” “having,” “including,” and “containing” are to be construed as open-ended terms (i.e., meaning “including, but not limited to,”) unless otherwise noted. The term “connected” is to be construed as partly or wholly contained within, attached to, or joined together, even if there is something intervening. The phrase “based on” should be understood to be open-ended, and not limiting in any way, and is intended to be interpreted or otherwise be read as “based at least in part on,” where appropriate. Recitation of ranges of values herein are merely intended to serve as a shorthand method of referring individually to each separate value falling within the range, unless otherwise indicated herein, and each separate value is incorporated into the specification as if it were individually recited herein. All methods described herein can be performed in any suitable order unless otherwise indicated herein or otherwise clearly contradicted by context. The use of any and all examples, or exemplary language (e.g., “such as”) provided herein, is intended merely to better illuminate embodiments of the disclosure and does not pose a limitation on the scope of the disclosure unless otherwise claimed. No language in the specification should be construed as indicating any non-claimed element as essential to the practice of the disclosure.
  • Disjunctive language such as the phrase “at least one of X, Y, or Z,” unless specifically stated otherwise, is otherwise understood within the context as used in general to present that an item, term, etc., may be either X, Y, or Z, or any combination thereof (e.g., X, Y, and/or Z). Thus, such disjunctive language is not generally intended to, and should not, imply that certain embodiments require at least one of X, at least one of Y, or at least one of Z to each be present. Additionally, conjunctive language such as the phrase “at least one of X, Y, and Z,” unless specifically stated otherwise, should also be understood to mean X, Y, Z, or any combination thereof, including “X, Y, and/or Z.”
  • Preferred embodiments of this disclosure are described herein, including the best mode known to the inventors for carrying out the disclosure. Variations of those preferred embodiments may become apparent to those of ordinary skill in the art upon reading the foregoing description. The inventors expect skilled artisans to employ such variations as appropriate, and the inventors intend for the disclosure to be practiced otherwise than as specifically described herein. Accordingly, this disclosure includes all modifications and equivalents of the subject matter recited in the claims appended hereto as permitted by applicable law. Moreover, any combination of the above-described elements in all possible variations thereof is encompassed by the disclosure unless otherwise indicated herein or otherwise clearly contradicted by context.
  • All references, including publications, patent applications, and patents, cited herein are hereby incorporated by reference to the same extent as if each reference were individually and specifically indicated to be incorporated by reference and were set forth in its entirety herein.

Claims (20)

What is claimed is:
1. A method, comprising:
at an electronic device having at least a camera and a microphone:
displaying a virtual avatar generation interface;
displaying first preview content of a virtual avatar in the virtual avatar generation interface, the first preview content of the virtual avatar corresponding to realtime preview video frames of a user headshot in a field of view of the camera and associated headshot changes in an appearance;
while displaying the first preview content of the virtual avatar, detecting an input in the virtual avatar generation interface;
in response to detecting the input in the virtual avatar generation interface:
capturing, via the camera, a video signal associated with the user headshot during a recording session;
capturing, via the microphone, a user audio signal during the recording session;
extracting audio feature characteristics from the captured user audio signal; and
extracting facial feature characteristics associated with the face from the captured video signal; and
in response to detecting expiration of the recording session:
generating an adjusted audio signal from the captured audio signal based at least in part on the facial feature characteristics and the audio feature characteristics;
generating second preview content of the virtual avatar in the virtual avatar generation interface according to the facial feature characteristics and the adjusted audio signal; and
presenting the second preview content in the virtual avatar generation interface.
2. The method of claim 1, further comprising storing facial feature metadata associated with the facial feature characteristics extracted from the video signal and strong audio metadata associated with the audio feature characteristics extracted from the audio signal.
3. The method of claim 2, further comprising generating adjusted facial feature metadata from the facial feature metadata based at least in part on the facial feature characteristics and the audio feature characteristics.
4. The method of claim 3, wherein the second preview of the virtual avatar is displayed further according to the adjusted facial metadata.
5. An electronic device, comprising:
a camera;
a microphone; and
one or more processors in communication with the camera and the microphone, the one or more processors configured to:
while displaying a first preview of a virtual avatar, detecting an input in a virtual avatar generation interface;
in response to detecting the input in the virtual avatar generation interface, initiating a capture session including:
capturing, via the camera, a video signal associated with a face in a field of view of the camera;
capturing, via the microphone, an audio signal associated with the captured video signal;
extracting audio feature characteristics from the captured audio signal; and
extracting facial feature characteristics associated with the face from the captured video signal; and
in response to detecting expiration of the capture session:
generating an adjusted audio signal based at least in part on the audio feature characteristics and the facial feature characteristics; and
displaying a second preview of the virtual avatar in the virtual avatar generation interface according to the facial feature characteristics and the adjusted audio signal.
6. The electronic device of claim 5, wherein the audio signal is further adjusted based at least in part on a type of the virtual avatar.
7. The electronic device of claim 6, wherein the type of the virtual avatar is received based at least in part on an avatar type selection affordance presented in the virtual avatar generation interface.
8. The electronic device of claim 6, wherein the type of the virtual avatar includes an animal type, and wherein the adjusted audio signal is generated based at least in part on a predetermined sound associated with the animal type.
9. The electronic device of claim 5, wherein the one or more processors are further configured to determine whether a portion of the audio signal corresponds to the face in the field of view.
10. The electronic device of claim 9, wherein the one or more processors are further configured to, in accordance with a determination that the portion of the audio signal corresponds to the face, store the portion of the audio signal for use in generating the adjusted audio signal.
11. The electronic device of claim 9, wherein the one or more processors are further configured to, in accordance with a determination that the portion of the audio signal does not correspond to the face, discard at least the portion of the audio signal.
12. The electronic device of claim 5, wherein the audio feature characteristics comprise features of a voice associated with the face in the field of view.
13. The electronic device of claim 5, wherein the one or more processors are further configured to store facial feature metadata associated with the facial feature characteristics extracted from the video signal.
14. The electronic device of claim 13, wherein the one or more processors are further configured to generate adjusted facial metadata based at least in part on the facial feature characteristics and the audio feature characteristics.
15. The electronic device of claim 14, wherein the second preview of the virtual avatar is generated according to the adjusted facial metadata and the adjusted audio signal.
16. A computer-readable storage medium storing computer-executable instructions that, when executed by one or more processors, configure the one or more processors to perform operations comprising:
in response to detecting a request to generate an avatar video clip of a virtual avatar:
capturing, via a camera of an electronic device, a video signal associated with a face in a field of view of the camera;
capturing, via a microphone of the electronic device, an audio signal;
extracting voice feature characteristics from the captured audio signal; and
extracting facial feature characteristics associated with the face from the captured video signal; and
in response to detecting a request to preview the avatar video clip:
generating an adjusted audio signal based at least in part on the facial feature characteristics and the voice feature characteristics; and
displaying a preview of the video clip of the virtual avatar using the adjusted audio signal.
17. The computer-readable storage medium of claim 16, wherein the audio signal is adjusted based at least in part on a facial expression identified in the facial feature characteristics associated with the face.
18. The computer-readable storage medium of claim 16, wherein the adjusted audio signal is further adjusted by inserting one or more pre-stored audio samples.
19. The computer-readable storage medium of claim 16, wherein the audio signal is adjusted based at least in part on a level, pitch, duration, variable playback speed, speech spectral-format positions, speech spectral-format-levels, instantaneous playback speed, or change in a voice associated with the face.
20. The computer-readable storage medium of claim 16, wherein the one or more processors are further configured to perform the operations comprising transmitting the video clip of the virtual avatar to another electronic device.
US15/908,603 2017-05-16 2018-02-28 Voice effects based on facial expressions Abandoned US20180336716A1 (en)

Priority Applications (6)

Application Number Priority Date Filing Date Title
US15/908,603 US20180336716A1 (en) 2017-05-16 2018-02-28 Voice effects based on facial expressions
US16/033,111 US10861210B2 (en) 2017-05-16 2018-07-11 Techniques for providing audio and video effects
KR1020207022657A KR102367143B1 (en) 2018-02-28 2019-02-26 Voice effects based on facial expressions
CN201980016107.6A CN111787986A (en) 2018-02-28 2019-02-26 Voice effects based on facial expressions
DE112019001058.1T DE112019001058T5 (en) 2018-02-28 2019-02-26 VOICE EFFECTS BASED ON FACIAL EXPRESSIONS
PCT/US2019/019554 WO2019168834A1 (en) 2018-02-28 2019-02-26 Voice effects based on facial expressions

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US201762507177P 2017-05-16 2017-05-16
US201762556412P 2017-09-09 2017-09-09
US201762557121P 2017-09-11 2017-09-11
US15/908,603 US20180336716A1 (en) 2017-05-16 2018-02-28 Voice effects based on facial expressions

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/033,111 Continuation-In-Part US10861210B2 (en) 2017-05-16 2018-07-11 Techniques for providing audio and video effects

Publications (1)

Publication Number Publication Date
US20180336716A1 true US20180336716A1 (en) 2018-11-22

Family

ID=64269597

Family Applications (4)

Application Number Title Priority Date Filing Date
US15/870,195 Active US10521091B2 (en) 2017-05-16 2018-01-12 Emoji recording and sending
US15/908,603 Abandoned US20180336716A1 (en) 2017-05-16 2018-02-28 Voice effects based on facial expressions
US15/940,017 Active US10845968B2 (en) 2017-05-16 2018-03-29 Emoji recording and sending
US15/940,232 Active US10379719B2 (en) 2017-05-16 2018-03-29 Emoji recording and sending

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US15/870,195 Active US10521091B2 (en) 2017-05-16 2018-01-12 Emoji recording and sending

Family Applications After (2)

Application Number Title Priority Date Filing Date
US15/940,017 Active US10845968B2 (en) 2017-05-16 2018-03-29 Emoji recording and sending
US15/940,232 Active US10379719B2 (en) 2017-05-16 2018-03-29 Emoji recording and sending

Country Status (3)

Country Link
US (4) US10521091B2 (en)
CN (2) CN111563943B (en)
DK (2) DK179867B1 (en)

Cited By (43)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190114497A1 (en) * 2017-10-13 2019-04-18 Cirrus Logic International Semiconductor Ltd. Detection of liveness
US20190114496A1 (en) * 2017-10-13 2019-04-18 Cirrus Logic International Semiconductor Ltd. Detection of liveness
US20190304472A1 (en) * 2018-03-30 2019-10-03 Qualcomm Incorporated User authentication
US10607386B2 (en) 2016-06-12 2020-03-31 Apple Inc. Customized avatars and associated framework
US10666920B2 (en) 2009-09-09 2020-05-26 Apple Inc. Audio alteration techniques
US10720166B2 (en) * 2018-04-09 2020-07-21 Synaptics Incorporated Voice biometrics systems and methods
WO2020171385A1 (en) * 2019-02-19 2020-08-27 Samsung Electronics Co., Ltd. Electronic device supporting avatar recommendation and download
US10770076B2 (en) 2017-06-28 2020-09-08 Cirrus Logic, Inc. Magnetic detection of replay attack
USD898761S1 (en) * 2018-04-26 2020-10-13 Lg Electronics Inc. Display screen with graphical user interface
US10818296B2 (en) * 2018-06-21 2020-10-27 Intel Corporation Method and system of robust speaker recognition activation
US10832702B2 (en) 2017-10-13 2020-11-10 Cirrus Logic, Inc. Robustness of speech processing system against ultrasound and dolphin attacks
US10839808B2 (en) 2017-10-13 2020-11-17 Cirrus Logic, Inc. Detection of replay attack
US10847165B2 (en) 2017-10-13 2020-11-24 Cirrus Logic, Inc. Detection of liveness
US10853464B2 (en) 2017-06-28 2020-12-01 Cirrus Logic, Inc. Detection of replay attack
US10915614B2 (en) 2018-08-31 2021-02-09 Cirrus Logic, Inc. Biometric authentication
US10922570B1 (en) * 2019-07-29 2021-02-16 NextVPU (Shanghai) Co., Ltd. Entering of human face information into database
US10984083B2 (en) 2017-07-07 2021-04-20 Cirrus Logic, Inc. Authentication of user using ear biometric data
US11037574B2 (en) 2018-09-05 2021-06-15 Cirrus Logic, Inc. Speaker recognition and speaker change detection
US11042618B2 (en) 2017-07-07 2021-06-22 Cirrus Logic, Inc. Methods, apparatus and systems for biometric processes
US11042617B2 (en) 2017-07-07 2021-06-22 Cirrus Logic, Inc. Methods, apparatus and systems for biometric processes
US11042616B2 (en) 2017-06-27 2021-06-22 Cirrus Logic, Inc. Detection of replay attack
US11051117B2 (en) 2017-11-14 2021-06-29 Cirrus Logic, Inc. Detection of loudspeaker playback
US11131697B2 (en) * 2019-08-27 2021-09-28 Sean C. Butler System and method for combining a remote audio source with an animatronically controlled puppet
US11189071B2 (en) 2019-02-07 2021-11-30 Samsung Electronics Co., Ltd. Electronic device for providing avatar animation and method thereof
US11256402B1 (en) * 2020-08-12 2022-02-22 Facebook, Inc. Systems and methods for generating and broadcasting digital trails of visual media
US11264037B2 (en) 2018-01-23 2022-03-01 Cirrus Logic, Inc. Speaker identification
US11270707B2 (en) 2017-10-13 2022-03-08 Cirrus Logic, Inc. Analysing speech signals
US11276409B2 (en) 2017-11-14 2022-03-15 Cirrus Logic, Inc. Detection of replay attack
US11289067B2 (en) * 2019-06-25 2022-03-29 International Business Machines Corporation Voice generation based on characteristics of an avatar
USD960899S1 (en) 2020-08-12 2022-08-16 Meta Platforms, Inc. Display screen with a graphical user interface
USD960898S1 (en) 2020-08-12 2022-08-16 Meta Platforms, Inc. Display screen with a graphical user interface
US11475899B2 (en) 2018-01-23 2022-10-18 Cirrus Logic, Inc. Speaker identification
US11527265B2 (en) * 2018-11-02 2022-12-13 BriefCam Ltd. Method and system for automatic object-aware video or audio redaction
US11609640B2 (en) * 2020-06-21 2023-03-21 Apple Inc. Emoji user interfaces
US11631402B2 (en) 2018-07-31 2023-04-18 Cirrus Logic, Inc. Detection of replay attack
US20230135516A1 (en) * 2021-10-29 2023-05-04 Snap Inc. Voice notes with changing effects
US11735189B2 (en) 2018-01-23 2023-08-22 Cirrus Logic, Inc. Speaker identification
US11755701B2 (en) 2017-07-07 2023-09-12 Cirrus Logic Inc. Methods, apparatus and systems for authentication
US11829461B2 (en) 2017-07-07 2023-11-28 Cirrus Logic Inc. Methods, apparatus and systems for audio playback
US20230410396A1 (en) * 2022-06-17 2023-12-21 Lemon Inc. Audio or visual input interacting with video creation
US11868592B2 (en) 2019-09-27 2024-01-09 Apple Inc. User interfaces for customizing graphical objects
US20240015262A1 (en) * 2022-07-07 2024-01-11 At&T Intellectual Property I, L.P. Facilitating avatar modifications for learning and other videotelephony sessions in advanced networks
USD1020798S1 (en) * 2020-12-21 2024-04-02 Samsung Electronics Co., Ltd. Display screen or portion thereof with graphical user interface

Families Citing this family (124)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
ATE546294T1 (en) 2005-08-12 2012-03-15 Tcms Transparent Beauty Llc SYSTEM AND METHOD FOR DELIVERING A REFLECTION MODIFICATION AGENT TO IMPROVE THE APPEARANCE OF HUMAN SKIN
US8184901B2 (en) * 2007-02-12 2012-05-22 Tcms Transparent Beauty Llc System and method for applying a reflectance modifying agent to change a person's appearance based on a digital image
USD618248S1 (en) * 2008-09-23 2010-06-22 Apple Inc. Graphical user interface for a display screen or portion thereof
US8584031B2 (en) 2008-11-19 2013-11-12 Apple Inc. Portable touch screen device, method, and graphical user interface for using emoji characters
TWI439960B (en) 2010-04-07 2014-06-01 Apple Inc Avatar editing environment
US9459781B2 (en) 2012-05-09 2016-10-04 Apple Inc. Context-specific user interfaces for displaying animated sequences
US10824297B2 (en) * 2012-11-26 2020-11-03 Google Llc System for and method of accessing and selecting emoticons, content, and mood messages during chat sessions
US10452253B2 (en) 2014-08-15 2019-10-22 Apple Inc. Weather user interface
US9940637B2 (en) 2015-06-05 2018-04-10 Apple Inc. User interface for loyalty accounts and private label accounts
CN107921317B (en) 2015-08-20 2021-07-06 苹果公司 Motion-based dial and complex function block
US10445425B2 (en) 2015-09-15 2019-10-15 Apple Inc. Emoji and canned responses
US20170236318A1 (en) * 2016-02-15 2017-08-17 Microsoft Technology Licensing, Llc Animated Digital Ink
DK179374B1 (en) * 2016-06-12 2018-05-28 Apple Inc Handwriting keyboard for monitors
US10009536B2 (en) 2016-06-12 2018-06-26 Apple Inc. Applying a simulated optical effect based on data received from multiple camera sensors
US11580608B2 (en) 2016-06-12 2023-02-14 Apple Inc. Managing contact information for communication applications
DK179471B1 (en) 2016-09-23 2018-11-26 Apple Inc. Image data for enhanced user interactions
USD845311S1 (en) * 2017-01-10 2019-04-09 Google Llc Computer display screen or portion thereof with transitional graphical user interface
US10943100B2 (en) 2017-01-19 2021-03-09 Mindmaze Holding Sa Systems, methods, devices and apparatuses for detecting facial expression
DK179412B1 (en) 2017-05-12 2018-06-06 Apple Inc Context-Specific User Interfaces
US10861210B2 (en) 2017-05-16 2020-12-08 Apple Inc. Techniques for providing audio and video effects
DK179867B1 (en) 2017-05-16 2019-08-06 Apple Inc. RECORDING AND SENDING EMOJI
KR102435337B1 (en) 2017-05-16 2022-08-22 애플 인크. Emoji recording and sending
DK180859B1 (en) 2017-06-04 2022-05-23 Apple Inc USER INTERFACE CAMERA EFFECTS
USD843442S1 (en) 2017-09-10 2019-03-19 Apple Inc. Type font
US10593087B2 (en) * 2017-10-23 2020-03-17 Paypal, Inc. System and method for generating emoji mashups with machine learning
US11328533B1 (en) * 2018-01-09 2022-05-10 Mindmaze Holding Sa System, method and apparatus for detecting facial expression for motion capture
USD844700S1 (en) * 2018-01-18 2019-04-02 Apple Inc. Type font
US11508107B2 (en) 2018-02-26 2022-11-22 Didimo, Inc. Additional developments to the automatic rig creation process
US10796468B2 (en) 2018-02-26 2020-10-06 Didimo, Inc. Automatic rig creation process
US11741650B2 (en) 2018-03-06 2023-08-29 Didimo, Inc. Advanced electronic messaging utilizing animatable 3D models
US11062494B2 (en) * 2018-03-06 2021-07-13 Didimo, Inc. Electronic messaging utilizing animatable 3D models
US20190314995A1 (en) * 2018-04-12 2019-10-17 Aeolus Robotics Corporation Limited Robot and method for controlling the same
US11573679B2 (en) * 2018-04-30 2023-02-07 The Trustees of the California State University Integration of user emotions for a smartphone or other communication device environment
US11327650B2 (en) 2018-05-07 2022-05-10 Apple Inc. User interfaces having a collection of complications
US10375313B1 (en) * 2018-05-07 2019-08-06 Apple Inc. Creative camera
DK180078B1 (en) 2018-05-07 2020-03-31 Apple Inc. USER INTERFACE FOR AVATAR CREATION
US11722764B2 (en) 2018-05-07 2023-08-08 Apple Inc. Creative camera
DK179992B1 (en) 2018-05-07 2020-01-14 Apple Inc. Visning af brugergrænseflader associeret med fysiske aktiviteter
USD879132S1 (en) * 2018-06-03 2020-03-24 Apple Inc. Electronic device with graphical user interface
DK201870623A1 (en) 2018-09-11 2020-04-15 Apple Inc. User interfaces for simulated depth effects
US10674072B1 (en) 2019-05-06 2020-06-02 Apple Inc. User interfaces for capturing and managing visual media
US11770601B2 (en) 2019-05-06 2023-09-26 Apple Inc. User interfaces for capturing and managing visual media
US11128792B2 (en) 2018-09-28 2021-09-21 Apple Inc. Capturing and displaying images with multiple focal planes
US11321857B2 (en) 2018-09-28 2022-05-03 Apple Inc. Displaying and editing images with depth information
JP1662142S (en) * 2018-10-12 2020-06-22
JP1653584S (en) * 2018-10-12 2020-02-25
USD919659S1 (en) * 2018-10-12 2021-05-18 Huawei Technologies Co., Ltd. Mobile phone with a graphical user interface
JP1662141S (en) * 2018-10-12 2020-06-22
USD883312S1 (en) * 2018-10-29 2020-05-05 Apple Inc. Electronic device with graphical user interface
JP1653610S (en) * 2018-11-29 2020-02-25
JP1656792S (en) * 2018-11-29 2020-04-06
KR102591686B1 (en) 2018-12-04 2023-10-19 삼성전자주식회사 Electronic device for generating augmented reality emoji and method thereof
EP3904956A4 (en) * 2018-12-28 2022-02-16 Sony Group Corporation Imaging device, imaging method, and program
US11107261B2 (en) 2019-01-18 2021-08-31 Apple Inc. Virtual avatar animation based on facial feature movement
JP7277149B2 (en) * 2019-01-18 2023-05-18 キヤノン株式会社 Imaging device and its control method
USD900925S1 (en) * 2019-02-01 2020-11-03 Apple Inc. Type font and electronic device with graphical user interface
USD902221S1 (en) 2019-02-01 2020-11-17 Apple Inc. Electronic device with animated graphical user interface
USD900871S1 (en) 2019-02-04 2020-11-03 Apple Inc. Electronic device with animated graphical user interface
KR20200101206A (en) * 2019-02-19 2020-08-27 삼성전자주식회사 Method for providing shoot mode based on virtual character and electronic device performing thereof
KR20200101208A (en) * 2019-02-19 2020-08-27 삼성전자주식회사 Electronic device and method for providing user interface for editing of emoji in conjunction with camera function thereof
CN109873756B (en) * 2019-03-08 2020-04-03 百度在线网络技术(北京)有限公司 Method and apparatus for transmitting information
JP6644928B1 (en) * 2019-03-29 2020-02-12 株式会社ドワンゴ Distribution server, viewer terminal, distributor terminal, distribution method, information processing method and program
US10999629B1 (en) * 2019-04-23 2021-05-04 Snap Inc. Automated graphical image modification scaling based on rules
CN113157190A (en) 2019-05-06 2021-07-23 苹果公司 Limited operation of electronic devices
US11706521B2 (en) 2019-05-06 2023-07-18 Apple Inc. User interfaces for capturing and managing visual media
CN113535047A (en) 2019-05-06 2021-10-22 苹果公司 Integration of head portraits with multiple applications
US11960701B2 (en) 2019-05-06 2024-04-16 Apple Inc. Using an illustration to show the passing of time
DK201970530A1 (en) 2019-05-06 2021-01-28 Apple Inc Avatar integration with multiple applications
US11131967B2 (en) 2019-05-06 2021-09-28 Apple Inc. Clock faces for an electronic device
USD956101S1 (en) 2019-06-02 2022-06-28 Apple Inc. Type font and electronic device with graphical user interface
JPWO2020262261A1 (en) * 2019-06-28 2020-12-30
US10911387B1 (en) * 2019-08-12 2021-02-02 Snap Inc. Message reminder interface
US11182945B2 (en) 2019-08-29 2021-11-23 Didimo, Inc. Automatically generating an animatable object from various types of user input
US11645800B2 (en) 2019-08-29 2023-05-09 Didimo, Inc. Advanced systems and methods for automatically generating an animatable object from various types of user input
CN110719584B (en) * 2019-09-02 2021-07-16 华为技术有限公司 Method and electronic equipment for short-distance information transmission
US11232646B2 (en) * 2019-09-06 2022-01-25 Snap Inc. Context-based virtual object rendering
USD964458S1 (en) * 2019-09-30 2022-09-20 Apple Inc. Type font
JP7070533B2 (en) * 2019-11-26 2022-05-18 セイコーエプソン株式会社 Image data generation method, program and information processing equipment
US11163988B2 (en) * 2019-12-02 2021-11-02 International Business Machines Corporation Selective interactive event tracking based on user interest
US11707694B2 (en) * 2019-12-06 2023-07-25 Virginie Mascia Message delivery apparatus and methods
US11227442B1 (en) 2019-12-19 2022-01-18 Snap Inc. 3D captions with semantic graphical elements
US11477318B1 (en) * 2019-12-30 2022-10-18 Snap Inc. Shortcut keypad for visual electronic communications
CN111243049B (en) * 2020-01-06 2021-04-02 北京字节跳动网络技术有限公司 Face image processing method and device, readable medium and electronic equipment
US11474671B2 (en) * 2020-01-31 2022-10-18 Salesforce.Com, Inc. Neutralizing designs of user interfaces
CN111309227B (en) * 2020-02-03 2022-05-31 联想(北京)有限公司 Animation production method and equipment and computer readable storage medium
US20210286502A1 (en) * 2020-03-16 2021-09-16 Apple Inc. Devices, Methods, and Graphical User Interfaces for Providing Computer-Generated Experiences
US11455078B1 (en) 2020-03-31 2022-09-27 Snap Inc. Spatial navigation and creation interface
JP7442091B2 (en) * 2020-04-30 2024-03-04 グリー株式会社 Video distribution device, video distribution method, and video distribution program
US10990166B1 (en) * 2020-05-10 2021-04-27 Truthify, LLC Remote reaction capture and analysis system
US11526256B2 (en) 2020-05-11 2022-12-13 Apple Inc. User interfaces for managing user interface sharing
US11921998B2 (en) 2020-05-11 2024-03-05 Apple Inc. Editing features of an avatar
US11372659B2 (en) 2020-05-11 2022-06-28 Apple Inc. User interfaces for managing user interface sharing
DK181103B1 (en) 2020-05-11 2022-12-15 Apple Inc User interfaces related to time
US11054973B1 (en) 2020-06-01 2021-07-06 Apple Inc. User interfaces for managing media
WO2021252160A1 (en) 2020-06-08 2021-12-16 Apple Inc. Presenting avatars in three-dimensional environments
KR20230025475A (en) * 2020-06-25 2023-02-21 스냅 인코포레이티드 Avatar Status Updates in Messaging Systems
CN111914790B (en) * 2020-08-14 2022-08-02 电子科技大学 Real-time human body rotation angle identification method based on double cameras under different scenes
US11128591B1 (en) * 2020-08-27 2021-09-21 Cisco Technology, Inc. Dynamic interaction of a dynamic ideogram in an electronic messaging environment
US11212449B1 (en) 2020-09-25 2021-12-28 Apple Inc. User interfaces for media capture and management
WO2022073113A1 (en) * 2020-10-05 2022-04-14 Mirametrix Inc. System and methods for enhanced videoconferencing
US11694590B2 (en) 2020-12-21 2023-07-04 Apple Inc. Dynamic user interface with time indicator
US11797162B2 (en) 2020-12-22 2023-10-24 Snap Inc. 3D painting on an eyewear device
US11782577B2 (en) 2020-12-22 2023-10-10 Snap Inc. Media content player on an eyewear device
CN112764845B (en) * 2020-12-30 2022-09-16 北京字跳网络技术有限公司 Video processing method and device, electronic equipment and computer readable storage medium
US11720239B2 (en) 2021-01-07 2023-08-08 Apple Inc. Techniques for user interfaces related to an event
US11531406B2 (en) 2021-04-20 2022-12-20 Snap Inc. Personalized emoji dictionary
US11593548B2 (en) 2021-04-20 2023-02-28 Snap Inc. Client device processing received emoji-first messages
US11888797B2 (en) * 2021-04-20 2024-01-30 Snap Inc. Emoji-first messaging
US11778339B2 (en) 2021-04-30 2023-10-03 Apple Inc. User interfaces for altering visual media
US11539876B2 (en) 2021-04-30 2022-12-27 Apple Inc. User interfaces for altering visual media
US11921992B2 (en) 2021-05-14 2024-03-05 Apple Inc. User interfaces related to time
US11652960B2 (en) * 2021-05-14 2023-05-16 Qualcomm Incorporated Presenting a facial expression in a virtual meeting
US11714536B2 (en) * 2021-05-21 2023-08-01 Apple Inc. Avatar sticker editor user interfaces
US11785064B2 (en) 2021-05-27 2023-10-10 International Business Machines Corporation Individual user content control in multiuser content delivery systems
US20220383338A1 (en) * 2021-05-27 2022-12-01 International Business Machines Corporation Dynamic presenter feedback in multiuser content delivery systems
US11776190B2 (en) 2021-06-04 2023-10-03 Apple Inc. Techniques for managing an avatar on a lock screen
US11601387B2 (en) 2021-06-08 2023-03-07 Microsoft Technology Licensing, Llc Generating composite images by combining subsequent data
US20230009287A1 (en) * 2021-07-12 2023-01-12 Samsung Electronics Co., Ltd. Method for providing avatar and electronic device supporting the same
US20240096033A1 (en) * 2021-10-11 2024-03-21 Meta Platforms Technologies, Llc Technology for creating, replicating and/or controlling avatars in extended reality
USD1017618S1 (en) * 2021-10-13 2024-03-12 Zoom Video Communications, Inc. Display screen or portion thereof with graphical user interface
US20230127090A1 (en) * 2021-10-22 2023-04-27 Snap Inc. Voice note with face tracking
US11568131B1 (en) 2021-11-11 2023-01-31 Microsoft Technology Licensing, Llc Command based personalized composite templates
US11635871B1 (en) 2021-11-11 2023-04-25 Microsoft Technology Licensing, Llc Command based personalized composite icons
US20230260184A1 (en) * 2022-02-17 2023-08-17 Zoom Video Communications, Inc. Facial expression identification and retargeting to an avatar

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040179037A1 (en) * 2003-03-03 2004-09-16 Blattner Patrick D. Using avatars to communicate context out-of-band
US20040250210A1 (en) * 2001-11-27 2004-12-09 Ding Huang Method for customizing avatars and heightening online safety
US20070115349A1 (en) * 2005-11-03 2007-05-24 Currivan Bruce J Method and system of tracking and stabilizing an image transmitted using video telephony
US20070260984A1 (en) * 2006-05-07 2007-11-08 Sony Computer Entertainment Inc. Methods for interactive communications with real time effects and avatar environment interaction
US20140092130A1 (en) * 2012-09-28 2014-04-03 Glen J. Anderson Selectively augmenting communications transmitted by a communication device
US20140282000A1 (en) * 2013-03-15 2014-09-18 Tawfiq AlMaghlouth Animated character conversation generator
US20150156598A1 (en) * 2013-12-03 2015-06-04 Cisco Technology, Inc. Microphone mute/unmute notification
US20150379752A1 (en) * 2013-03-20 2015-12-31 Intel Corporation Avatar-based transfer protocols, icon generation and doll animation
US20160006987A1 (en) * 2012-09-06 2016-01-07 Wenlong Li System and method for avatar creation and synchronization
US20160110922A1 (en) * 2014-10-16 2016-04-21 Tal Michael HARING Method and system for enhancing communication by using augmented reality

Family Cites Families (208)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2918499B2 (en) 1996-09-17 1999-07-12 株式会社エイ・ティ・アール人間情報通信研究所 Face image information conversion method and face image information conversion device
US6173402B1 (en) 1998-03-04 2001-01-09 International Business Machines Corporation Technique for localizing keyphrase-based data encryption and decryption
DE69910757T2 (en) 1998-04-13 2004-06-17 Eyematic Interfaces, Inc., Santa Monica WAVELET-BASED FACIAL MOTION DETECTION FOR AVATAR ANIMATION
JP2001092783A (en) 1999-09-27 2001-04-06 Hitachi Software Eng Co Ltd Method and system for personal authentication, and recording medium
KR20010056965A (en) 1999-12-17 2001-07-04 박희완 Method for creating human characters by partial image synthesis
US20010047365A1 (en) 2000-04-19 2001-11-29 Hiawatha Island Software Co, Inc. System and method of packaging and unpackaging files into a markup language record for network search and archive services
US9064344B2 (en) 2009-03-01 2015-06-23 Facecake Technologies, Inc. Image transformation systems and methods
JP2003150550A (en) 2001-11-14 2003-05-23 Toshiba Corp Information processing system
US7227976B1 (en) 2002-07-08 2007-06-05 Videomining Corporation Method and system for real-time facial image enhancement
GB0220748D0 (en) 2002-09-06 2002-10-16 Saw You Com Ltd Improved communication using avatars
US7180524B1 (en) 2002-09-30 2007-02-20 Dale Axelrod Artists' color display system
US7908554B1 (en) 2003-03-03 2011-03-15 Aol Inc. Modifying avatar behavior based on user action or mood
JP2005115480A (en) 2003-10-03 2005-04-28 Toshiba Social Automation Systems Co Ltd Authentication system and computer readable storage medium
US7969447B2 (en) 2004-05-06 2011-06-28 Pixar Dynamic wrinkle mapping
JP4449723B2 (en) 2004-12-08 2010-04-14 ソニー株式会社 Image processing apparatus, image processing method, and program
KR100511210B1 (en) 2004-12-27 2005-08-30 주식회사지앤지커머스 Method for converting 2d image into pseudo 3d image and user-adapted total coordination method in use artificial intelligence, and service besiness method thereof
US8488023B2 (en) 2009-05-20 2013-07-16 DigitalOptics Corporation Europe Limited Identifying facial expressions in acquired digital images
US20060294465A1 (en) 2005-06-22 2006-12-28 Comverse, Inc. Method and system for creating and distributing mobile avatars
US8963926B2 (en) 2006-07-11 2015-02-24 Pandoodle Corporation User customized animated video and method for making the same
JP2007052770A (en) 2005-07-21 2007-03-01 Omron Corp Monitoring apparatus
JP2007036928A (en) 2005-07-29 2007-02-08 Sharp Corp Mobile information terminal device
US8370360B2 (en) 2005-12-31 2013-02-05 G & G Commerce Ltd. Merchandise recommending system and method thereof
US8620038B2 (en) 2006-05-05 2013-12-31 Parham Aarabi Method, system and computer program product for automatic and semi-automatic modification of digital images of faces
US20080052242A1 (en) 2006-08-23 2008-02-28 Gofigure! Llc Systems and methods for exchanging graphics between communication devices
WO2008064483A1 (en) 2006-11-30 2008-06-05 James Andrew Wanless A method and system for providing automated real-time contact information
US8705720B2 (en) 2007-02-08 2014-04-22 Avaya Inc. System, method and apparatus for clientless two factor authentication in VoIP networks
US20080242423A1 (en) 2007-03-27 2008-10-02 Shelford Securities, S.A. Real-money online multi-player trivia system, methods of operation, and storage medium
JP5219184B2 (en) 2007-04-24 2013-06-26 任天堂株式会社 Training program, training apparatus, training system, and training method
US20080300572A1 (en) 2007-06-01 2008-12-04 Medtronic Minimed, Inc. Wireless monitor for a personal medical device system
GB2450757A (en) 2007-07-06 2009-01-07 Sony Comp Entertainment Europe Avatar customisation, transmission and reception
US9596308B2 (en) 2007-07-25 2017-03-14 Yahoo! Inc. Display of person based information including person notes
US8726194B2 (en) 2007-07-27 2014-05-13 Qualcomm Incorporated Item selection using enhanced control
US8146005B2 (en) 2007-08-07 2012-03-27 International Business Machines Corporation Creating a customized avatar that reflects a user's distinguishable attributes
US20090055484A1 (en) 2007-08-20 2009-02-26 Thanh Vuong System and method for representation of electronic mail users using avatars
US20090135177A1 (en) 2007-11-20 2009-05-28 Big Stage Entertainment, Inc. Systems and methods for voice personalization of video content
CN101472158A (en) * 2007-12-27 2009-07-01 上海银晨智能识别科技有限公司 Network photographic device based on human face detection and image forming method
US8600120B2 (en) 2008-01-03 2013-12-03 Apple Inc. Personal computing device control using face detection and recognition
NZ587526A (en) 2008-01-31 2013-06-28 Univ Southern California Facial performance synthesis using deformation driven polynomial displacement maps
EP2263190A2 (en) 2008-02-13 2010-12-22 Ubisoft Entertainment S.A. Live-action image capture
US20160032887A1 (en) * 2008-02-25 2016-02-04 Roland Wayne Patton Method and apparatus for converting energy in a moving fluid mass to rotational energy drving a transmission
US8169438B1 (en) 2008-03-31 2012-05-01 Pixar Temporally coherent hair deformation
JP5383668B2 (en) 2008-04-30 2014-01-08 株式会社アクロディア Character display data generating apparatus and method
US20120081282A1 (en) 2008-05-17 2012-04-05 Chin David H Access of an application of an electronic device based on a facial gesture
US8401284B2 (en) 2008-05-28 2013-03-19 Apple Inc. Color correcting method and apparatus
CN102046249B (en) 2008-06-02 2015-12-16 耐克创新有限合伙公司 Create the system and method for incarnation
JP2010028404A (en) 2008-07-18 2010-02-04 Hitachi Ltd Recording and reproducing device
US10983665B2 (en) 2008-08-01 2021-04-20 Samsung Electronics Co., Ltd. Electronic apparatus and method for implementing user interface
US20100153847A1 (en) * 2008-12-17 2010-06-17 Sony Computer Entertainment America Inc. User deformation of movie character images
US20100169376A1 (en) 2008-12-29 2010-07-01 Yahoo! Inc. Visual search engine for personal dating
US8289130B2 (en) 2009-02-19 2012-10-16 Apple Inc. Systems and methods for identifying unauthorized users of an electronic device
CN101930284B (en) 2009-06-23 2014-04-09 腾讯科技(深圳)有限公司 Method, device and system for implementing interaction between video and virtual network scene
KR101651128B1 (en) * 2009-10-05 2016-08-25 엘지전자 주식회사 Mobile terminal and method for controlling application execution thereof
TWI439960B (en) 2010-04-07 2014-06-01 Apple Inc Avatar editing environment
US9542038B2 (en) 2010-04-07 2017-01-10 Apple Inc. Personalizing colors of user interfaces
US8694899B2 (en) 2010-06-01 2014-04-08 Apple Inc. Avatars reflecting user states
US20170098122A1 (en) 2010-06-07 2017-04-06 Affectiva, Inc. Analysis of image content with associated manipulation of expression presentation
US20110304629A1 (en) 2010-06-09 2011-12-15 Microsoft Corporation Real-time animation of facial expressions
KR20120013727A (en) 2010-08-06 2012-02-15 삼성전자주식회사 Display apparatus and control method thereof
US20120069028A1 (en) 2010-09-20 2012-03-22 Yahoo! Inc. Real-time animations of emoticons using facial recognition during a video chat
US8830226B2 (en) 2010-09-28 2014-09-09 Apple Inc. Systems, methods, and computer-readable media for integrating a three-dimensional asset with a three-dimensional model
US9519396B2 (en) 2010-09-28 2016-12-13 Apple Inc. Systems, methods, and computer-readable media for placing an asset on a three-dimensional model
US8558844B2 (en) 2010-09-28 2013-10-15 Apple Inc. Systems, methods, and computer-readable media for changing colors of displayed assets
EP3920465B1 (en) 2010-10-08 2023-12-06 Brian Lee Moffat Private data sharing system
CN102479388A (en) 2010-11-22 2012-05-30 北京盛开互动科技有限公司 Expression interaction method based on face tracking and analysis
KR20120059994A (en) * 2010-12-01 2012-06-11 삼성전자주식회사 Apparatus and method for control avatar using expression control point
US9563703B2 (en) 2011-03-10 2017-02-07 Cox Communications, Inc. System, method and device for sharing of playlists of authorized content with other users
US20120289290A1 (en) * 2011-05-12 2012-11-15 KT Corporation, KT TECH INC. Transferring objects between application windows displayed on mobile terminal
US9013489B2 (en) 2011-06-06 2015-04-21 Microsoft Technology Licensing, Llc Generation of avatar reflecting player appearance
US9082235B2 (en) 2011-07-12 2015-07-14 Microsoft Technology Licensing, Llc Using facial data for device authentication or subject identification
CN102999934A (en) * 2011-09-19 2013-03-27 上海威塔数字科技有限公司 Three-dimensional animation system of computer and animation method
US10262327B1 (en) 2011-09-22 2019-04-16 Glance Networks, Inc. Integrating screen sharing sessions with customer relationship management
US8867849B1 (en) 2011-10-05 2014-10-21 Google Inc. Suggesting profile images for a social network
CN103116902A (en) 2011-11-16 2013-05-22 华为软件技术有限公司 Three-dimensional virtual human head image generation method, and method and device of human head image motion tracking
US20130147933A1 (en) 2011-12-09 2013-06-13 Charles J. Kulas User image insertion into a text message
CN103164117A (en) 2011-12-09 2013-06-19 富泰华工业(深圳)有限公司 External operating device, electronic device and delayed screen locking method thereof
US9292195B2 (en) 2011-12-29 2016-03-22 Apple Inc. Device, method, and graphical user interface for configuring and implementing restricted interactions for applications
WO2013097139A1 (en) 2011-12-29 2013-07-04 Intel Corporation Communication using avatar
EP2811628B1 (en) * 2012-02-01 2015-11-18 Nissan Motor Co., Ltd. Method for manufacturing magnet pieces for forming field-pole magnets
US9747495B2 (en) * 2012-03-06 2017-08-29 Adobe Systems Incorporated Systems and methods for creating and distributing modifiable animated video messages
US9251360B2 (en) 2012-04-27 2016-02-02 Intralinks, Inc. Computerized method and system for managing secure mobile device content viewing in a networked secure collaborative exchange environment
WO2013152453A1 (en) 2012-04-09 2013-10-17 Intel Corporation Communication using interactive avatars
CN111275795A (en) 2012-04-09 2020-06-12 英特尔公司 System and method for avatar generation, rendering and animation
WO2013152454A1 (en) 2012-04-09 2013-10-17 Intel Corporation System and method for avatar management and selection
US8254647B1 (en) 2012-04-16 2012-08-28 Google Inc. Facial image quality assessment
US9104908B1 (en) 2012-05-22 2015-08-11 Image Metrics Limited Building systems for adaptive tracking of facial features across individuals and groups
US20130342672A1 (en) 2012-06-25 2013-12-26 Amazon Technologies, Inc. Using gaze determination with device input
US20140013422A1 (en) 2012-07-03 2014-01-09 Scott Janus Continuous Multi-factor Authentication
EP2682739A1 (en) * 2012-07-05 2014-01-08 Atlas Material Testing Technology GmbH Weathering test for various UV wavelengths of UV light emitting diodes
CN102799383B (en) 2012-07-18 2014-05-14 腾讯科技(深圳)有限公司 Screen sectional drawing method and screen sectional drawing device for mobile terminals
US20140078144A1 (en) 2012-09-14 2014-03-20 Squee, Inc. Systems and methods for avatar creation
US9826286B2 (en) * 2012-09-18 2017-11-21 Viacom International Inc. Video editing method and tool
US9314692B2 (en) * 2012-09-21 2016-04-19 Luxand, Inc. Method of creating avatar from user submitted image
KR102013443B1 (en) * 2012-09-25 2019-08-22 삼성전자주식회사 Method for transmitting for image and an electronic device thereof
KR102001913B1 (en) 2012-09-27 2019-07-19 엘지전자 주식회사 Mobile Terminal and Operating Method for the Same
JP5964190B2 (en) 2012-09-27 2016-08-03 京セラ株式会社 Terminal device
US9696898B2 (en) 2012-11-14 2017-07-04 Facebook, Inc. Scrolling through a series of content items
US20140157153A1 (en) 2012-12-05 2014-06-05 Jenny Yuen Select User Avatar on Detected Emotion
US9466142B2 (en) 2012-12-17 2016-10-11 Intel Corporation Facial movement based avatar animation
KR102049855B1 (en) * 2013-01-31 2019-11-28 엘지전자 주식회사 Mobile terminal and controlling method thereof
US9148489B2 (en) 2013-03-11 2015-09-29 Qualcomm Incorporated Exchanging a contact profile between client devices during a communication session
US9298361B2 (en) 2013-03-15 2016-03-29 Apple Inc. Analyzing applications for different access modes
US9747716B1 (en) * 2013-03-15 2017-08-29 Lucasfilm Entertainment Company Ltd. Facial animation models
US20140279062A1 (en) 2013-03-15 2014-09-18 Rodan & Fields, Llc Consultant tool for direct selling
KR102138512B1 (en) 2013-03-26 2020-07-28 엘지전자 주식회사 Display Device And Controlling Method Thereof
US9460541B2 (en) * 2013-03-29 2016-10-04 Intel Corporation Avatar animation, social networking and touch screen applications
WO2014169024A2 (en) * 2013-04-09 2014-10-16 Carepics, Llc Protecting patient information in virtual medical consulations
KR102080183B1 (en) 2013-04-18 2020-04-14 삼성전자주식회사 Electronic device and method for unlocking in the electronic device
IL226047A (en) * 2013-04-29 2017-12-31 Hershkovitz Reshef May Method and system for providing personal emoticons
GB2515266B (en) 2013-05-09 2018-02-28 Disney Entpr Inc Manufacturing Process for 3D Printed Objects
CN105190699B (en) 2013-06-05 2019-09-03 英特尔公司 Karaoke incarnation animation based on facial motion data
US9378576B2 (en) 2013-06-07 2016-06-28 Faceshift Ag Online modeling for real-time facial animation
US9626493B2 (en) 2013-06-08 2017-04-18 Microsoft Technology Licensing, Llc Continuous digital content protection
CN103346957B (en) 2013-07-02 2016-12-28 北京播思无线技术有限公司 A kind of system and method according to contact person's message alteration contact head image expression
CN103413072A (en) 2013-07-27 2013-11-27 金硕澳门离岸商业服务有限公司 Application program protection method and device
US9317954B2 (en) * 2013-09-23 2016-04-19 Lucasfilm Entertainment Company Ltd. Real-time performance capture with on-the-fly correctives
US9508197B2 (en) 2013-11-01 2016-11-29 Microsoft Technology Licensing, Llc Generating an avatar from real time image data
US20160313896A1 (en) 2013-11-05 2016-10-27 Telefonaktiebolaget L M Ericsson (Publ) Methods of processing electronic files including combined close and delete, and related systems and computer program products
US10877629B2 (en) 2016-10-13 2020-12-29 Tung Inc. Conversion and display of a user input
US10528219B2 (en) * 2015-08-10 2020-01-07 Tung Inc. Conversion and display of a user input
US9489760B2 (en) * 2013-11-14 2016-11-08 Intel Corporation Mechanism for facilitating dynamic simulation of avatars corresponding to changing user performances as detected at computing devices
US20150172238A1 (en) * 2013-12-18 2015-06-18 Lutebox Ltd. Sharing content on devices with reduced user actions
US9477878B2 (en) 2014-01-28 2016-10-25 Disney Enterprises, Inc. Rigid stabilization of facial expressions
EP3100672A1 (en) 2014-01-30 2016-12-07 Konica Minolta, Inc. Organ image capturing device
KR102201738B1 (en) 2014-02-05 2021-01-12 엘지전자 주식회사 Display device and method for controlling the same
WO2015138320A1 (en) 2014-03-09 2015-09-17 Vishal Gupta Management of group-sourced contacts directories, systems and methods
CN106104633A (en) 2014-03-19 2016-11-09 英特尔公司 Facial expression and/or the mutual incarnation apparatus and method driving
CN104935497B (en) 2014-03-20 2020-08-14 腾讯科技(深圳)有限公司 Communication session method and device
CN106165392B (en) * 2014-03-31 2019-01-22 富士胶片株式会社 Image processing apparatus, camera and image processing method
US10845982B2 (en) * 2014-04-28 2020-11-24 Facebook, Inc. Providing intelligent transcriptions of sound messages in a messaging application
US20170080346A1 (en) * 2014-05-01 2017-03-23 Mohamad Abbas Methods and systems relating to personalized evolving avatars
CN105099861A (en) 2014-05-19 2015-11-25 阿里巴巴集团控股有限公司 User emotion-based display control method and display control device
US20160019195A1 (en) * 2014-05-20 2016-01-21 Jesse Kelly SULTANIK Method and system for posting comments on hosted web pages
US20150350141A1 (en) * 2014-05-31 2015-12-03 Apple Inc. Message user interfaces for capture and transmittal of media and location content
US9766702B2 (en) 2014-06-19 2017-09-19 Apple Inc. User detection by a computing device
CN104112091A (en) 2014-06-26 2014-10-22 小米科技有限责任公司 File locking method and device
MX2016016624A (en) 2014-06-27 2017-04-27 Microsoft Technology Licensing Llc Data protection based on user and gesture recognition.
US20160191958A1 (en) * 2014-12-26 2016-06-30 Krush Technologies, Llc Systems and methods of providing contextual features for digital communication
KR102103939B1 (en) 2014-07-25 2020-04-24 인텔 코포레이션 Avatar facial expression animations with head rotation
US20160134840A1 (en) * 2014-07-28 2016-05-12 Alexa Margaret McCulloch Avatar-Mediated Telepresence Systems with Enhanced Filtering
US9536228B2 (en) 2014-07-31 2017-01-03 Gretel, LLC Contact management systems
US9561444B2 (en) 2014-08-01 2017-02-07 Electronic Arts Inc. Resolving graphical conflicts between objects
CN105374055B (en) 2014-08-20 2018-07-03 腾讯科技(深圳)有限公司 Image processing method and device
US20160057087A1 (en) 2014-08-21 2016-02-25 Facebook, Inc. Processing media messages based on the capabilities of the receiving device
US20160055370A1 (en) 2014-08-21 2016-02-25 Futurewei Technologies, Inc. System and Methods of Generating User Facial Expression Library for Messaging and Social Networking Applications
KR101540544B1 (en) 2014-09-05 2015-07-30 서용창 Message service method using character, user device for performing the method, message application comprising the method
CN105139438B (en) 2014-09-19 2018-01-12 电子科技大学 video human face cartoon generation method
WO2016045015A1 (en) * 2014-09-24 2016-03-31 Intel Corporation Avatar audio communication systems and techniques
CN106575445B (en) 2014-09-24 2021-02-05 英特尔公司 Fur avatar animation
EP3198560A4 (en) 2014-09-24 2018-05-09 Intel Corporation User gesture driven avatar apparatus and method
CN106575446B (en) * 2014-09-24 2020-04-21 英特尔公司 Facial motion driven animation communication system
US10572103B2 (en) 2014-09-30 2020-02-25 Apple Inc. Timeline view of recently opened documents
US20160105388A1 (en) * 2014-10-09 2016-04-14 Footspot, Inc. System and method for digital media capture and related social networking
US9430696B2 (en) 2014-10-09 2016-08-30 Sensory, Incorporated Continuous enrollment for face verification
US9491258B2 (en) 2014-11-12 2016-11-08 Sorenson Communications, Inc. Systems, communication endpoints, and related methods for distributing images corresponding to communication endpoints
US9799133B2 (en) 2014-12-23 2017-10-24 Intel Corporation Facial gesture driven animation of non-facial features
WO2016101131A1 (en) 2014-12-23 2016-06-30 Intel Corporation Augmented facial animation
WO2016101124A1 (en) 2014-12-23 2016-06-30 Intel Corporation Sketch selection for rendering 3d model avatar
JP6152125B2 (en) 2015-01-23 2017-06-21 任天堂株式会社 Program, information processing apparatus, information processing system, and avatar image generation method
JP6461630B2 (en) 2015-02-05 2019-01-30 任天堂株式会社 COMMUNICATION SYSTEM, COMMUNICATION DEVICE, PROGRAM, AND DISPLAY METHOD
JP6511293B2 (en) 2015-02-26 2019-05-15 株式会社エヌ・ティ・ティ・データ User monitoring system
CN104753766B (en) 2015-03-02 2019-03-22 小米科技有限责任公司 Expression sending method and device
EP3268096A4 (en) * 2015-03-09 2018-10-10 Ventana 3D LLC Avatar control system
US10812429B2 (en) * 2015-04-03 2020-10-20 Glu Mobile Inc. Systems and methods for message communication
KR102450865B1 (en) 2015-04-07 2022-10-06 인텔 코포레이션 Avatar keyboard
US20170069124A1 (en) * 2015-04-07 2017-03-09 Intel Corporation Avatar generation and animations
US20160350957A1 (en) 2015-05-26 2016-12-01 Andrew Woods Multitrack Virtual Puppeteering
US20170018289A1 (en) * 2015-07-15 2017-01-19 String Theory, Inc. Emoji as facetracking video masks
US10171985B1 (en) 2015-07-22 2019-01-01 Ginko LLC Method and apparatus for data sharing
CN108140020A (en) 2015-07-30 2018-06-08 英特尔公司 The enhanced incarnation animation of emotion
US20170046507A1 (en) 2015-08-10 2017-02-16 International Business Machines Corporation Continuous facial recognition for adaptive data restriction
US11048739B2 (en) 2015-08-14 2021-06-29 Nasdaq, Inc. Computer-implemented systems and methods for intelligently retrieving, analyzing, and synthesizing data from databases
US20170083086A1 (en) 2015-09-18 2017-03-23 Kai Mazur Human-Computer Interface
US20170083524A1 (en) * 2015-09-22 2017-03-23 Riffsy, Inc. Platform and dynamic interface for expression-based retrieval of expressive media content
US11138207B2 (en) * 2015-09-22 2021-10-05 Google Llc Integrated dynamic interface for expression-based retrieval of expressive media content
WO2017079731A1 (en) 2015-11-06 2017-05-11 Mursion, Inc. Control system for virtual characters
US10025972B2 (en) * 2015-11-16 2018-07-17 Facebook, Inc. Systems and methods for dynamically generating emojis based on image analysis of facial features
US10664741B2 (en) 2016-01-14 2020-05-26 Samsung Electronics Co., Ltd. Selecting a behavior of a virtual agent
CN105844101A (en) 2016-03-25 2016-08-10 惠州Tcl移动通信有限公司 Emotion data processing method and system based smart watch and the smart watch
US10366090B2 (en) 2016-03-30 2019-07-30 Facebook, Inc. Displaying temporary profile content items on communication networks
KR20170112497A (en) * 2016-03-31 2017-10-12 엘지전자 주식회사 Mobile terminal and method for controlling the same
US11320982B2 (en) 2016-05-18 2022-05-03 Apple Inc. Devices, methods, and graphical user interfaces for messaging
US10009536B2 (en) 2016-06-12 2018-06-26 Apple Inc. Applying a simulated optical effect based on data received from multiple camera sensors
US10607386B2 (en) 2016-06-12 2020-03-31 Apple Inc. Customized avatars and associated framework
EP3264251B1 (en) 2016-06-29 2019-09-04 Dassault Systèmes Generation of a color of an object displayed on a gui
US10348662B2 (en) 2016-07-19 2019-07-09 Snap Inc. Generating customized electronic messaging graphics
US20180047200A1 (en) * 2016-08-11 2018-02-15 Jibjab Media Inc. Combining user images and computer-generated illustrations to produce personalized animated digital avatars
CN117193617A (en) 2016-09-23 2023-12-08 苹果公司 Head portrait creation and editing
DK179471B1 (en) 2016-09-23 2018-11-26 Apple Inc. Image data for enhanced user interactions
JP6266736B1 (en) 2016-12-07 2018-01-24 株式会社コロプラ Method for communicating via virtual space, program for causing computer to execute the method, and information processing apparatus for executing the program
US10528801B2 (en) * 2016-12-07 2020-01-07 Keyterra LLC Method and system for incorporating contextual and emotional visualization into electronic communications
JP6240301B1 (en) 2016-12-26 2017-11-29 株式会社コロプラ Method for communicating via virtual space, program for causing computer to execute the method, and information processing apparatus for executing the program
US20180225263A1 (en) * 2017-02-06 2018-08-09 Microsoft Technology Licensing, Llc Inline insertion viewport
EP3580686B1 (en) 2017-02-07 2023-03-15 InterDigital VC Holdings, Inc. System and method to prevent surveillance and preserve privacy in virtual reality
US10438393B2 (en) 2017-03-16 2019-10-08 Linden Research, Inc. Virtual reality presentation of body postures of avatars
KR20180120449A (en) 2017-04-27 2018-11-06 삼성전자주식회사 Method for sharing profile image and electronic device implementing the same
KR102435337B1 (en) 2017-05-16 2022-08-22 애플 인크. Emoji recording and sending
DK179867B1 (en) 2017-05-16 2019-08-06 Apple Inc. RECORDING AND SENDING EMOJI
US10397391B1 (en) 2017-05-22 2019-08-27 Ginko LLC Two-way permission-based directory of contacts
US20190114037A1 (en) 2017-10-17 2019-04-18 Blend Systems, Inc. Systems and methods for distributing customized avatars responsive to events
CN111386553A (en) 2017-11-29 2020-07-07 斯纳普公司 Graphics rendering for electronic messaging applications
CN111742351A (en) 2018-02-23 2020-10-02 三星电子株式会社 Electronic device for generating image including 3D avatar reflecting face motion through 3D avatar corresponding to face and method of operating the same
KR102565755B1 (en) 2018-02-23 2023-08-11 삼성전자주식회사 Electronic device for displaying an avatar performed a motion according to a movement of a feature point of a face and method of operating the same
US11062494B2 (en) 2018-03-06 2021-07-13 Didimo, Inc. Electronic messaging utilizing animatable 3D models
WO2019183276A1 (en) 2018-03-20 2019-09-26 Wright Rocky Jerome Augmented reality and messaging
US10607065B2 (en) 2018-05-03 2020-03-31 Adobe Inc. Generation of parameterized avatars
DK180078B1 (en) 2018-05-07 2020-03-31 Apple Inc. USER INTERFACE FOR AVATAR CREATION
CN110457092A (en) 2018-05-07 2019-11-15 苹果公司 Head portrait creates user interface
DK201970530A1 (en) 2019-05-06 2021-01-28 Apple Inc Avatar integration with multiple applications

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040250210A1 (en) * 2001-11-27 2004-12-09 Ding Huang Method for customizing avatars and heightening online safety
US20040179037A1 (en) * 2003-03-03 2004-09-16 Blattner Patrick D. Using avatars to communicate context out-of-band
US20070115349A1 (en) * 2005-11-03 2007-05-24 Currivan Bruce J Method and system of tracking and stabilizing an image transmitted using video telephony
US20070260984A1 (en) * 2006-05-07 2007-11-08 Sony Computer Entertainment Inc. Methods for interactive communications with real time effects and avatar environment interaction
US20160006987A1 (en) * 2012-09-06 2016-01-07 Wenlong Li System and method for avatar creation and synchronization
US20140092130A1 (en) * 2012-09-28 2014-04-03 Glen J. Anderson Selectively augmenting communications transmitted by a communication device
US20140282000A1 (en) * 2013-03-15 2014-09-18 Tawfiq AlMaghlouth Animated character conversation generator
US20150379752A1 (en) * 2013-03-20 2015-12-31 Intel Corporation Avatar-based transfer protocols, icon generation and doll animation
US20150156598A1 (en) * 2013-12-03 2015-06-04 Cisco Technology, Inc. Microphone mute/unmute notification
US20160110922A1 (en) * 2014-10-16 2016-04-21 Tal Michael HARING Method and system for enhancing communication by using augmented reality

Cited By (54)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10666920B2 (en) 2009-09-09 2020-05-26 Apple Inc. Audio alteration techniques
US10607386B2 (en) 2016-06-12 2020-03-31 Apple Inc. Customized avatars and associated framework
US11276217B1 (en) 2016-06-12 2022-03-15 Apple Inc. Customized avatars and associated framework
US11042616B2 (en) 2017-06-27 2021-06-22 Cirrus Logic, Inc. Detection of replay attack
US10770076B2 (en) 2017-06-28 2020-09-08 Cirrus Logic, Inc. Magnetic detection of replay attack
US11164588B2 (en) 2017-06-28 2021-11-02 Cirrus Logic, Inc. Magnetic detection of replay attack
US10853464B2 (en) 2017-06-28 2020-12-01 Cirrus Logic, Inc. Detection of replay attack
US11704397B2 (en) 2017-06-28 2023-07-18 Cirrus Logic, Inc. Detection of replay attack
US11755701B2 (en) 2017-07-07 2023-09-12 Cirrus Logic Inc. Methods, apparatus and systems for authentication
US11042617B2 (en) 2017-07-07 2021-06-22 Cirrus Logic, Inc. Methods, apparatus and systems for biometric processes
US11714888B2 (en) 2017-07-07 2023-08-01 Cirrus Logic Inc. Methods, apparatus and systems for biometric processes
US11042618B2 (en) 2017-07-07 2021-06-22 Cirrus Logic, Inc. Methods, apparatus and systems for biometric processes
US10984083B2 (en) 2017-07-07 2021-04-20 Cirrus Logic, Inc. Authentication of user using ear biometric data
US11829461B2 (en) 2017-07-07 2023-11-28 Cirrus Logic Inc. Methods, apparatus and systems for audio playback
US10839808B2 (en) 2017-10-13 2020-11-17 Cirrus Logic, Inc. Detection of replay attack
US11270707B2 (en) 2017-10-13 2022-03-08 Cirrus Logic, Inc. Analysing speech signals
US10847165B2 (en) 2017-10-13 2020-11-24 Cirrus Logic, Inc. Detection of liveness
US20190114497A1 (en) * 2017-10-13 2019-04-18 Cirrus Logic International Semiconductor Ltd. Detection of liveness
US11705135B2 (en) 2017-10-13 2023-07-18 Cirrus Logic, Inc. Detection of liveness
US11017252B2 (en) * 2017-10-13 2021-05-25 Cirrus Logic, Inc. Detection of liveness
US11023755B2 (en) * 2017-10-13 2021-06-01 Cirrus Logic, Inc. Detection of liveness
US20190114496A1 (en) * 2017-10-13 2019-04-18 Cirrus Logic International Semiconductor Ltd. Detection of liveness
US10832702B2 (en) 2017-10-13 2020-11-10 Cirrus Logic, Inc. Robustness of speech processing system against ultrasound and dolphin attacks
US11051117B2 (en) 2017-11-14 2021-06-29 Cirrus Logic, Inc. Detection of loudspeaker playback
US11276409B2 (en) 2017-11-14 2022-03-15 Cirrus Logic, Inc. Detection of replay attack
US11475899B2 (en) 2018-01-23 2022-10-18 Cirrus Logic, Inc. Speaker identification
US11735189B2 (en) 2018-01-23 2023-08-22 Cirrus Logic, Inc. Speaker identification
US11264037B2 (en) 2018-01-23 2022-03-01 Cirrus Logic, Inc. Speaker identification
US11694695B2 (en) 2018-01-23 2023-07-04 Cirrus Logic, Inc. Speaker identification
US20190304472A1 (en) * 2018-03-30 2019-10-03 Qualcomm Incorporated User authentication
US10733996B2 (en) * 2018-03-30 2020-08-04 Qualcomm Incorporated User authentication
US10720166B2 (en) * 2018-04-09 2020-07-21 Synaptics Incorporated Voice biometrics systems and methods
USD898761S1 (en) * 2018-04-26 2020-10-13 Lg Electronics Inc. Display screen with graphical user interface
US10818296B2 (en) * 2018-06-21 2020-10-27 Intel Corporation Method and system of robust speaker recognition activation
US11631402B2 (en) 2018-07-31 2023-04-18 Cirrus Logic, Inc. Detection of replay attack
US10915614B2 (en) 2018-08-31 2021-02-09 Cirrus Logic, Inc. Biometric authentication
US11748462B2 (en) 2018-08-31 2023-09-05 Cirrus Logic Inc. Biometric authentication
US11037574B2 (en) 2018-09-05 2021-06-15 Cirrus Logic, Inc. Speaker recognition and speaker change detection
US11527265B2 (en) * 2018-11-02 2022-12-13 BriefCam Ltd. Method and system for automatic object-aware video or audio redaction
US11189071B2 (en) 2019-02-07 2021-11-30 Samsung Electronics Co., Ltd. Electronic device for providing avatar animation and method thereof
US10921958B2 (en) 2019-02-19 2021-02-16 Samsung Electronics Co., Ltd. Electronic device supporting avatar recommendation and download
WO2020171385A1 (en) * 2019-02-19 2020-08-27 Samsung Electronics Co., Ltd. Electronic device supporting avatar recommendation and download
US11289067B2 (en) * 2019-06-25 2022-03-29 International Business Machines Corporation Voice generation based on characteristics of an avatar
US10922570B1 (en) * 2019-07-29 2021-02-16 NextVPU (Shanghai) Co., Ltd. Entering of human face information into database
US11131697B2 (en) * 2019-08-27 2021-09-28 Sean C. Butler System and method for combining a remote audio source with an animatronically controlled puppet
US11868592B2 (en) 2019-09-27 2024-01-09 Apple Inc. User interfaces for customizing graphical objects
US11609640B2 (en) * 2020-06-21 2023-03-21 Apple Inc. Emoji user interfaces
US11256402B1 (en) * 2020-08-12 2022-02-22 Facebook, Inc. Systems and methods for generating and broadcasting digital trails of visual media
USD960898S1 (en) 2020-08-12 2022-08-16 Meta Platforms, Inc. Display screen with a graphical user interface
USD960899S1 (en) 2020-08-12 2022-08-16 Meta Platforms, Inc. Display screen with a graphical user interface
USD1020798S1 (en) * 2020-12-21 2024-04-02 Samsung Electronics Co., Ltd. Display screen or portion thereof with graphical user interface
US20230135516A1 (en) * 2021-10-29 2023-05-04 Snap Inc. Voice notes with changing effects
US20230410396A1 (en) * 2022-06-17 2023-12-21 Lemon Inc. Audio or visual input interacting with video creation
US20240015262A1 (en) * 2022-07-07 2024-01-11 At&T Intellectual Property I, L.P. Facilitating avatar modifications for learning and other videotelephony sessions in advanced networks

Also Published As

Publication number Publication date
US10379719B2 (en) 2019-08-13
DK201770720A1 (en) 2019-01-29
DK179867B1 (en) 2019-08-06
US20180335927A1 (en) 2018-11-22
US10845968B2 (en) 2020-11-24
CN111563943A (en) 2020-08-21
CN111563943B (en) 2022-09-09
DK201770721A1 (en) 2019-01-29
DK179948B1 (en) 2019-10-22
US20180335930A1 (en) 2018-11-22
US20180335929A1 (en) 2018-11-22
CN115393485A (en) 2022-11-25
US10521091B2 (en) 2019-12-31

Similar Documents

Publication Publication Date Title
US20180336716A1 (en) Voice effects based on facial expressions
US10861210B2 (en) Techniques for providing audio and video effects
KR102367143B1 (en) Voice effects based on facial expressions
US20230316643A1 (en) Virtual role-based multimodal interaction method, apparatus and system, storage medium, and terminal
US10360716B1 (en) Enhanced avatar animation
US10019825B2 (en) Karaoke avatar animation based on facial motion data
US20210352380A1 (en) Characterizing content for audio-video dubbing and other transformations
US20140278403A1 (en) Systems and methods for interactive synthetic character dialogue
KR101492359B1 (en) Input support device, input support method, and recording medium
KR20070020252A (en) Method of and system for modifying messages
US11653072B2 (en) Method and system for generating interactive media content
JP2016511837A (en) Voice change for distributed story reading
US20200166670A1 (en) Personalizing weather forecast
CN113392201A (en) Information interaction method, information interaction device, electronic equipment, medium and program product
US10812430B2 (en) Method and system for creating a mercemoji
US20170169857A1 (en) Method and Electronic Device for Video Play
WO2022089224A1 (en) Video communication method and apparatus, electronic device, computer readable storage medium, and computer program product
WO2022242706A1 (en) Multimodal based reactive response generation
KR20240038941A (en) Method and system for generating avatar based on text
US10347299B2 (en) Method to automate media stream curation utilizing speech and non-speech audio cue analysis
KR102086780B1 (en) Method, apparatus and computer program for generating cartoon data
JP4917920B2 (en) Content generation apparatus and content generation program
KR102184053B1 (en) Method for generating webtoon video for delivering lines converted into different voice for each character
CN115393484A (en) Method and device for generating virtual image animation, electronic equipment and storage medium
US10803114B2 (en) Systems and methods for generating audio or video presentation heat maps

Legal Events

Date Code Title Description
AS Assignment

Owner name: APPLE INC., CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:RAMPRASHAD, SEAN A.;AVENDANO, CARLOS M.;LINDAHL, ARAM M.;SIGNING DATES FROM 20180227 TO 20180228;REEL/FRAME:045081/0404

STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER

STPP Information on status: patent application and granting procedure in general

Free format text: FINAL REJECTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION