WO2017157272A1 - 一种信息处理方法及终端 - Google Patents

一种信息处理方法及终端 Download PDF

Info

Publication number
WO2017157272A1
WO2017157272A1 PCT/CN2017/076576 CN2017076576W WO2017157272A1 WO 2017157272 A1 WO2017157272 A1 WO 2017157272A1 CN 2017076576 W CN2017076576 W CN 2017076576W WO 2017157272 A1 WO2017157272 A1 WO 2017157272A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
sticker
media information
video
type
Prior art date
Application number
PCT/CN2017/076576
Other languages
English (en)
French (fr)
Inventor
汪倩怡
戴阳刚
应磊
吴发强
崔凌睿
邬振海
高雨
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Priority to KR1020187026680A priority Critical patent/KR102135215B1/ko
Priority to JP2018527883A priority patent/JP2019504532A/ja
Publication of WO2017157272A1 publication Critical patent/WO2017157272A1/zh
Priority to US15/962,663 priority patent/US11140436B2/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/4104Peripherals receiving signals from specially adapted client devices
    • H04N21/4126The peripheral being portable, e.g. PDAs or mobile phones
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/60Editing figures and text; Combining figures or text
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/433Content storage operation, e.g. storage operation in response to a pause request, caching operations
    • H04N21/4334Recording operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/4104Peripherals receiving signals from specially adapted client devices
    • H04N21/4122Peripherals receiving signals from specially adapted client devices additional display device, e.g. video projector
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/414Specialised client platforms, e.g. receiver in car or embedded in a mobile appliance
    • H04N21/41407Specialised client platforms, e.g. receiver in car or embedded in a mobile appliance embedded in a portable device, e.g. video client on a mobile phone, PDA, laptop
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/422Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
    • H04N21/4223Cameras
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/478Supplemental services, e.g. displaying phone caller identification, shopping application
    • H04N21/4788Supplemental services, e.g. displaying phone caller identification, shopping application communicating with other users, e.g. chatting
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/61Control of cameras or camera modules based on recognised objects
    • H04N23/611Control of cameras or camera modules based on recognised objects where the recognised objects include parts of the human body
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/62Control of parameters via user interfaces
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/80Camera processing pipelines; Components thereof
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/76Television signal recording
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • H04N21/44016Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving splicing one content stream with another content stream, e.g. for substituting a video clip
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/442Monitoring of processes or resources, e.g. detecting the failure of a recording device, monitoring the downstream bandwidth, the number of times a movie has been viewed, the storage space available from the internal hard disk
    • H04N21/44213Monitoring of end-user related data
    • H04N21/44218Detecting physical presence or behaviour of the user, e.g. using sensors to detect if the user is leaving the room or changes his face expression during a TV program

Definitions

  • the present application relates to communication technologies, and in particular, to an information processing method and a terminal.
  • the recorded video can also be attached with other information related or unrelated to the video content to get the synthesized video.
  • the operation of attaching other information is very complicated and cumbersome.
  • the user needs to go to the material library to select the information.
  • the user needs to go to the material library to select information related to a certain video information in the video content.
  • the interaction mode is very complicated and requires multiple interactions, which will inevitably lead to
  • the processing efficiency is low, and the interaction time is back and forth, and the processing time cost is also high.
  • the effect of the synthesized video finally achieved may be unsatisfactory, and does not meet the real user requirements, and the user may re-synthesize once. Then, the information processing cost of video synthesis using the terminal will continue to increase.
  • the related art there is no effective solution to this problem.
  • the embodiment of the present application provides an information processing method and a terminal, which solve at least the problems existing in the prior art.
  • an information processing method comprising:
  • the terminal acquires a first operation to trigger collection of the first media information
  • the terminal detects a face area that meets a preset condition during the process of collecting the first media information.
  • the detected expression change or the change amount of the user action change is reported to the server as key information;
  • the first media information and the second media information are video synthesized.
  • a terminal comprising:
  • a triggering unit configured to acquire a first operation to trigger collection of the first media information
  • the detecting unit when detecting the change of the expression in the face area that meets the preset condition or the change of the user action in the collection frame during the process of collecting the first media information, the detected expression change or the user action changes The amount of change is reported to the server as key information;
  • a receiving unit configured to receive second media information that is pushed by the server and that corresponds to the key information
  • a synthesizing unit configured to perform video synthesis on the first media information and the second media information.
  • a nonvolatile storage medium storing a program, when a program stored in the nonvolatile storage medium is executed by a computer device including one or more processors, The computer device is caused to perform the information processing method as described above.
  • the information processing method in the embodiment of the present application includes: the terminal acquires the first operation to trigger the collection of the first media information; and the terminal detects the expression in the face region that meets the preset condition in the process of collecting the first media information.
  • the change or the user action in the collection box changes, the obtained change amount is reported to the server as key information; the terminal receives the second media information corresponding to the key information pushed by the server; and the first media information and the second media information Video synthesis is performed according to the preset configuration.
  • the corresponding second media information is obtained from the server based on the change amount, and the first media information and the second media information are followed according to the embodiment.
  • the preset configuration performs video synthesis to replay the synthesized video after the first media information is collected.
  • corresponding second media information is displayed at a specified position and a specified time of the first media information. Because the second media information does not need to be manually selected and added by the user, the operation flow is simplified, the processing efficiency is improved, and the detection result (such as the expression change or the user action change) obtained in the process of collecting the first media information is requested.
  • the second media information Corresponding to the second media information is also more in line with real user needs.
  • the position and time of the second media information can also be matched with the detection results such as expression changes or user movement changes, so the position and time point are also accurate. Not only does it reduce multiple interactions, but it does not require subsequent re-adjustment and re-synthesis, which reduces the information processing cost and time cost of video synthesis.
  • 1 is a schematic diagram of hardware entities of each party performing information interaction in an embodiment of the present application
  • FIG. 2 is a schematic diagram of an implementation process of Embodiment 1 of the present application.
  • FIG. 3 is a schematic diagram of an application scenario applied to an embodiment of the present application.
  • FIG. 4 is a schematic diagram of trigger video recording using an embodiment of the present application.
  • FIG. 13 is a schematic flowchart of an implementation process of Embodiment 2 of the present application.
  • FIG. 14 is a schematic structural diagram of a third embodiment of the present application.
  • FIG. 15 is a schematic structural diagram of a hardware component of Embodiment 4 of the present application.
  • 16 is a schematic diagram of a scenario in which RGB and transparency are separately stored in an embodiment of the present application.
  • FIG. 17 is a system architecture diagram of an example to which an embodiment of the present application is applied.
  • FIG. 1 is a schematic diagram of hardware entities of each party performing information interaction in the embodiment of the present application, and FIG. 1 includes: a server 11 and terminal devices 21, 22, 23, and 24, wherein the terminal devices 21, 22, 23, and 24 pass through a wired network. Or the wireless network interacts with the server for information.
  • the terminal device may include a mobile phone, a desktop computer, a PC, an all-in-one, and the like. Among them, the terminal device is installed with various applications that meet the daily and work needs of the user. If the user likes to take pictures and record video, applications such as image processing applications, video processing applications, etc. are installed in the terminal device; social applications are also installed for social sharing needs.
  • the processing results obtained by using the image processing application and the video processing application can also be shared by the social application.
  • the terminal device periodically obtains update data packets of each application from the server and saves them locally, when needed.
  • the application on the terminal device starts an application (such as a video processing application), and acquires a first operation, such as an operation of turning on video recording, thereby triggering collection of the first media information such as a video.
  • the terminal device detects the change of the expression in the face region that meets the preset condition or changes the user action in the collection frame during the process of collecting the first media information, the terminal device reports the obtained change amount as key information to the server.
  • the expression change in the face area may be a smile, and the user action change may be blinking or scratching the scissors.
  • the terminal receives second media information, such as a sticker, corresponding to the key information that is pushed by the server; and performs video synthesis on the first media information and the second media information.
  • second media information such as a sticker
  • the corresponding second media information is obtained from the server based on the change amount, and the first media information and the second media information are performed.
  • the video is synthesized so that the synthesized video is replayed after the first media information is collected.
  • corresponding second media information is displayed at a specified position and a specified time of the first media information.
  • the operation flow is simplified, the processing efficiency is improved, and the detection result (such as the expression change or the user action change) obtained in the process of collecting the first media information is requested.
  • the detection result (such as the expression change or the user action change) obtained in the process of collecting the first media information is requested.
  • Corresponding to the second media information is also more in line with real user needs.
  • FIG. 1 is only an example of a system architecture that implements the embodiments of the present application.
  • the embodiment of the present application is not limited to the system structure described in FIG. 1 above, and various embodiments of the present application are proposed based on the system architecture.
  • the information processing method of the embodiment of the present application is as shown in FIG. 2, and the method includes:
  • Step 101 The terminal acquires a first operation to trigger collection of the first media information.
  • the user is lying on the sofa using a terminal device such as the mobile phone 11.
  • the user interface of the mobile phone 11 is as shown in FIG. 4, and includes various types of application icons, such as a music play icon, a function setting icon, a mail sending and receiving icon, and the like.
  • the user performs the first operation, such as clicking on the video processing application icon identified by A1 with a finger, and entering the process of video recording, thereby touching
  • the collection of the first media information such as a video. For example, you can record a scene in a room, or take a self-portrait for yourself.
  • Step 102 When the terminal detects the change of the expression in the face area that meets the preset condition or the user action change in the collection frame in the process of collecting the first media information, the terminal reports the obtained change amount as key information to the server. .
  • the terminal can capture expression changes in the face region, for example, smiling, crying, frowning, etc. .
  • the terminal device can also detect changes in user movements in the collection frame (or the frame), such as a scissors hand. This detection is not limited to the face area. It is also possible to combine the expression changes in the face area with the changes in the user's movements, for example, combining the scissors hands and the smiles in the facial expressions for combined recognition.
  • the face recognition technology is based on the facial features of a person, and collects a face image or a video stream in a video recording, first determining whether there is a face in the video stream, and if there is a face, Further, the position and size of the face are given, and the position information of each main facial organ is located, and the respective positions and initial forms of the facial features in the face are obtained.
  • the form changes such as the smile
  • the position of the upper and lower lips is generated relative to the initial form.
  • Displacement and deformation indicate that facial expressions of facial features change, and expression changes can also be used to recognize changes in expression.
  • the face recognition in the embodiment of the present application is different from the conventional face recognition.
  • the conventional face recognition is to identify the user's identity through the constructed face recognition system, and the recognized face and the known face are performed. Compare to facilitate identity verification and identity lookup.
  • expression recognition In the process of expression recognition, it can be divided into four stages: acquisition and preprocessing of face images; face detection; expression feature extraction; and expression classification. If only through the face recognition and positioning mechanism, there will be inaccuracies, and the expression recognition mechanism is a more accurate processing strategy.
  • Expression recognition is closely related to face recognition. For example, the positioning in face detection and face tracking are similar, but the feature extraction is different.
  • the features extracted by face recognition mainly focus on individual differences and characteristics of different faces, while facial expressions exist as interference signals, so face recognition does not pay much attention to facial expressions.
  • the embodiment of the present application needs to pay attention to the change of the expression to trigger the corresponding second media information, so that individual differences can be ignored, and attention is paid to extracting faces in different expression modes.
  • Feature extraction is the core step in facial expression recognition, which determines the final recognition result and affects the recognition rate.
  • the feature extraction can be divided into: static image feature extraction and moving image feature extraction.
  • static image feature extraction the deformation features of the expression (or the transient features of the expression) are extracted.
  • motion image feature extraction for the moving image, not only the table situation change characteristics of each frame are extracted, but also To extract the motion characteristics of a continuous sequence.
  • Deformation feature extraction can rely on neutral expressions or models to compare the generated expressions with neutral expressions to extract deformation features, while the extraction of motion features is directly dependent on the facial changes produced by the expressions.
  • Step 103 The terminal receives second media information corresponding to the key information that is pushed by the server.
  • a specific implementation of the step may be: after the step 102 reports the key information to the server, the server matches the corresponding second media information, such as sticker information, from the material library according to the key information, and pushes the second media information to the terminal, so that Subsequent to step 104, video synthesis is performed with the first media information.
  • the user does not need to manually select the sticker information, but automatically pushes the terminal information to the terminal according to the matching of the key information, and automatically synthesizes (such as superimposing the video and sticker information) the video processing result in the process of collecting the first media information (such as video).
  • the sticker information is displayed at a specified location and a specified time of the first media information (such as a video).
  • Step 104 Perform video synthesis on the first media information and the second media information.
  • the key information further includes: text information in the first media information.
  • the information processing method further includes: detecting the text information in the process of collecting the first media information, and reporting the information to the server as key information.
  • the text information in FIG. 5 specifically, the text information "A red fire” identified by A2 is included in the video information. After the video information is recorded, add the sticker information “Red Fire” as indicated by A2’.
  • the sticker information is the interaction with the server through the terminal multiple times, from the server material The library is manually selected, and then the sticker information is attached to the video information that has been recorded.
  • FIG. 6 shows another application scenario of the prior art. Specifically, the text information “boyfriend” identified by A3 is included in the video information. After the video information recording is completed, the sticker information "boyfriend” as identified by A3' is added.
  • the sticker information is manually selected from the server material library through the terminal interaction with the server multiple times, and then the sticker information is attached to the video information that has been recorded. This kind of processing is very cumbersome and requires multiple user interactions.
  • the subsequent search for the sticker is not necessarily what the user really needs. Even if the user really needs it, the user needs to manually add the video information that has already been recorded manually, for example, The sticker moves to the appropriate location of the video information and so on.
  • the video shown in FIG. 7 includes the text information “eat not fat” identified by A4, which is sent to the server as key information.
  • the matching sticker information obtained based on the key information is identified as A4'.
  • the video shown in FIG. 8 includes the text information "boyfriend” identified by A5, which is transmitted as a key message to the server.
  • the matching sticker information obtained based on the key information is identified as A5'.
  • B1 is used to identify the control button during video recording
  • B2 is used to identify the playback button after the video recording is over.
  • FIG. 9 is a schematic diagram of playing back the video after synthesizing the sticker information and the video at a suitable position and time point during recording of the video.
  • FIG. 9 is a schematic diagram of playing back the video after synthesizing the sticker information and the video at a suitable position and time point during recording of the video.
  • FIG. 10 is a schematic diagram of playing back a video after synthesizing the sticker information and the video at a suitable position and time point after recording the video in the embodiment of the present application, wherein when the video is played in the corresponding recorded video information, At the time of the year-end award, the text information of the corresponding voice can be displayed on the video interface.
  • the composite sticker information is also displayed on the video interface, and the dynamic sticker effect displays “a lot of year-end awards” and is matched with the currency unit.
  • An indicator such as ⁇ , combines it with the text "A lot of year-end awards.”
  • sticker shapes can be obtained by recognizing the facial expression or the user's motion.
  • the user action and the voice can be combined.
  • the user action can be used The happy eyes of the family.
  • other sticker information such as A6 may also be displayed on the video interface during the "happy blink” period. 'The eyes of the logo become two ⁇ ".
  • the user action can also be a snap. The user action triggers the display of "eyes become two ⁇ " as indicated by A6' in Fig. 11 or displays the sticker information "a lot of year-end prizes" as shown in Fig. 10.
  • FIG. 12 shows another application example using the embodiment of the present application.
  • other sticker shapes can also be obtained by recognizing facial expressions.
  • the voice played in the corresponding recorded video information is "I am so beautiful" as the A7 logo
  • the position of the cheek of the face is recognized, and the position of the cheek of the face is superimposed as shown by A7'.
  • Sticker information is a red-faced egg, a blush, or a blush in a five-official sticker type.
  • the video interface when there is a "I am so beautiful?", there is also a synthetic sticker information on the video interface, and the person's face is flushed.
  • the information processing method of the embodiment of the present application is as shown in FIG. 13 , and the method includes:
  • Step 201 The terminal starts the application, acquires the first operation, and triggers collection of the first media information.
  • the user is lying on the sofa using a terminal device such as the mobile phone 11.
  • the user interface of the mobile phone 11 is as shown in FIG. 4, and includes various types of application icons, such as a music play icon, a function setting icon, a mail sending and receiving icon, and the like.
  • the user performs the first operation, such as clicking the video processing application icon identified by the A1 with a finger to enter the process of video recording, thereby triggering the collection of the first media information, such as a video. For example, you can record a scene in a room, or take a self-portrait for yourself.
  • Step 202 When the terminal detects the change of the expression in the face area that meets the preset condition or the user action change in the collection frame in the process of collecting the first media information, the terminal reports the obtained change amount as key information to the server. .
  • the terminal device can capture expression changes in the face region, such as smiling, crying, frowning, etc. .
  • the terminal device can also detect the acquisition frame (or User action changes within the framing frame, for example, than scissors. This detection is not limited to the face area. It is also possible to combine the expression changes in the face area with the changes in the user's movements, for example, combining the scissors hands and the smiles in the facial expressions for combined recognition.
  • the face recognition technology is based on the facial features of a person, and collects a face image or a video stream in a video recording, first determining whether there is a face in the video stream, and if there is a face, Further, the position and size of the face are given, and the position information of each main facial organ is located, and the respective positions and initial forms of the facial features in the face are obtained.
  • the form changes such as the smile
  • the position of the upper and lower lips is generated relative to the initial form.
  • Displacement and deformation indicate that facial expressions of facial features change, and expression changes can also be used to recognize changes in expression.
  • the face recognition in the embodiment of the present application is different from the conventional face recognition.
  • the conventional face recognition is to identify the user's identity through the constructed face recognition system, and the recognized face and the known face are performed. Compare to facilitate identity verification and identity lookup.
  • expression recognition In the process of expression recognition, it can be divided into four stages: acquisition and preprocessing of face images; face detection; expression feature extraction; and expression classification. If only through the face recognition and positioning mechanism, there will be inaccuracies, and the expression recognition mechanism is a more accurate processing strategy.
  • Expression recognition is closely related to face recognition, such as positioning and face in face detection. Tracking these links is similar, but the feature extraction is different.
  • the features extracted by face recognition mainly focus on individual differences and characteristics of different faces, while facial expressions exist as interference signals. That is to say, but pay more attention to facial expressions.
  • the embodiment of the present application needs to pay attention to the change of the expression to trigger the corresponding second media information.
  • Feature extraction is the core step in facial expression recognition, which determines the final recognition result and affects the recognition rate.
  • the feature extraction can be divided into: static image feature extraction and moving image feature extraction.
  • static image feature extraction the deformation features of the expression (or the transient features of the expression) are extracted.
  • motion image feature extraction for the moving image, not only the table situation change characteristics of each frame are extracted, but also To extract the motion characteristics of a continuous sequence.
  • Deformation feature extraction can rely on neutral expressions or models to compare the generated expressions with neutral expressions to extract deformation features, while motion features
  • the extraction is directly dependent on the facial changes produced by the expression.
  • Step 203 The server selects, from the material library, a description file of the second media information and the second media information corresponding to the key information.
  • Step 204 The terminal receives a description file of the second media information and the second media information corresponding to the key information that is pushed by the server.
  • a specific implementation of the step may be: after the key information is reported to the server in step 202, the server matches the corresponding second media information, such as sticker information, from the material library according to the key information, and pushes the second media information to the server.
  • the terminal is subsequently subjected to video synthesis with the first media information in step 205.
  • the user does not need to manually select the sticker information, but automatically pushes the terminal information to the terminal according to the matching of the key information, and automatically synthesizes (such as superimposing the video and sticker information) the video processing result in the process of collecting the first media information (such as video).
  • the sticker information is displayed at a specified location and a specified time of the first media information (such as a video).
  • the description file and the second media information of the second media information corresponding to the key information may be simultaneously sent or separately sent, depending on the current network condition, if the network condition is good, the same is sent, if the network The situation is not good, in order to avoid the network is not good, lost data, can be issued separately.
  • Step 205 Perform video synthesis on the first media information and the second media information.
  • the key information further includes: text information in the first media information.
  • the method further includes: detecting the text information in the process of collecting the first media information, and reporting the information to the server as key information.
  • the text information in FIG. 5 specifically, the text information "A red fire” identified by A2 is included in the video information. After the video information is recorded, add the sticker information “Red Fire” as indicated by A2’.
  • the sticker information is manually selected from the server material library through the terminal interaction with the server multiple times, and then the sticker information is attached to the video information that has been recorded.
  • FIG. 6 is another application scenario of the prior art.
  • the video information includes a text message “boyfriend” identified by A3.
  • the sticker information "boyfriend" as identified by A3' is added.
  • the sticker information is manually selected from the server material library through the terminal interaction with the server multiple times, and then the sticker information is attached to the video information that has been recorded. This kind of processing is very cumbersome and requires multiple user interactions.
  • the subsequent search for the sticker is not necessarily what the user really needs. Even if the user really needs it, the user needs to manually add the video information that has already been recorded manually, for example, The sticker moves to the appropriate location of the video information and so on.
  • the video shown in FIG. 7 includes the text information “eat not fat” identified by A4, which is sent to the server as key information.
  • the matching sticker information obtained based on the key information is identified as A4'.
  • the video shown in FIG. 8 includes the text information "Eat not fat” identified by A5, which is sent to the server as key information.
  • the matching sticker information obtained based on the key information is identified as A5'.
  • B1 is used to identify the control button during video recording
  • B2 is used to identify the playback button after the video recording is over.
  • FIG. 9 is a schematic diagram of playing back the video after synthesizing the sticker information and the video at a suitable position and time point during recording of the video.
  • FIG. 10 is a schematic diagram of playing back a video after synthesizing the sticker information and the video at a suitable position and time point after recording the video in the embodiment of the present application, wherein when the video is played in the corresponding recorded video information, At the time of the year-end award, the text information of the corresponding voice can be displayed on the video interface.
  • the composite sticker information is also displayed on the video interface, and the dynamic sticker effect displays “a lot of year-end awards” and is matched with the currency unit.
  • An indicator such as ⁇ , combines it with the text "A lot of year-end awards.”
  • sticker shapes can be obtained by recognizing the facial expression or the user's motion.
  • the user action and the voice can be combined.
  • the user action can be a happy blink of the user.
  • the video interface can be displayed in addition to the display shown in FIG.
  • the user action may be a snap finger.
  • FIG. 12 shows another application example using the embodiment of the present application.
  • other sticker shapes can also be obtained by recognizing facial expressions.
  • the voice played in the corresponding recorded video information is "I am so beautiful" as the A7 logo
  • the position of the cheek of the face is recognized, and the position of the cheek of the face is superimposed as shown by A7'.
  • Sticker information is a red-faced egg, a blush, or a blush in a five-official sticker type.
  • the video interface when there is a "I am so beautiful?", there is also a synthetic sticker information on the video interface, and the person's face is flushed.
  • the video synthesis of the first media information and the second media information includes:
  • a first implementation solution in response to the expression change or the user action change, acquiring a corresponding feature detection result, and configuring the second media information according to the feature detection result and the description file of the second media information Performing video synthesis with the first media information, and displaying the second media information at a location specified by the first media information at a specified time point or time period.
  • the second media information in response to the text information, is video-combined with the first media information according to the configuration of the description file of the second media information, and is to be performed at a specified time point or time period.
  • the second media information is displayed at a location specified by the first media information.
  • the difference between the two schemes is that: in the first scheme, it is necessary to obtain feature coordinates (partial information or all information in the feature detection result), so as to combine the feature coordinates to determine which suitable designated position in the video information is to be put on the sticker information,
  • the second media information can determine the time point.
  • the placement of the sticker information has a fixed position and a fixed time requirement. According to the specified position and time point, the sticker information can be superimposed on the video information at a suitable time point. For example, as shown in FIG.
  • the second multimedia information includes at least one of the following categories: 1) a first type of sticker information triggered by the expression change or the user action change, such as a facial feature sticker and Triggering a class sticker; 2) a second type of sticker information that is displayed by excluding the expression change or the user action change, and a background sticker.
  • the responding to the expression change or the user action changes, acquiring a corresponding feature detection result, and using the second media information according to the feature detection result and the second
  • the configuration of the description file of the media information and the video synthesis of the first media information include:
  • A2 detecting a change in the feature coordinate caused by the expression change or the change of the user motion, and positioning the initial coordinate to the target coordinate to determine the superposition according to the position point obtained by the target coordinate positioning or the position area defined by the initial coordinate to the target coordinate.
  • A4. Perform video synthesis on the second media information and the first media information according to the determined location and the display time of the parsed first type of sticker information.
  • the second media information when the text information is responsive to the text information, the second media information is video-combined with the first media information according to the configuration of the description file of the second media information, including:
  • the terminal includes: a triggering unit 21, configured to acquire a first operation to trigger collection of the first media information, and a detecting unit 22, configured to detect the person in the process of collecting the first media information.
  • a triggering unit 21 configured to acquire a first operation to trigger collection of the first media information
  • a detecting unit 22 configured to detect the person in the process of collecting the first media information.
  • the obtained change amount is reported to the server as key information
  • the receiving unit 23 is configured to receive the second media information corresponding to the key information that is pushed by the server.
  • a synthesizing unit 24 configured to perform video synthesis on the first media information and the second media information.
  • the user is lying on the sofa using a terminal device such as the mobile phone 11.
  • the user interface of the mobile phone 11 is as shown in FIG. 4, and includes various types of application icons, such as a music play icon, a function setting icon, a mail sending and receiving icon, and the like.
  • the user performs the first operation, such as clicking the video processing application icon identified by the A1 with a finger to enter the process of video recording, thereby triggering the collection of the first media information, such as a video. For example, you can record a scene in a room, or take a self-portrait for yourself.
  • the terminal can capture expression changes in the face area, such as smiling, crying, frowning, and the like.
  • the terminal device can also detect changes in user movements in the collection frame (or the frame), such as a scissors hand. It is also possible to combine the expression changes in the face area with the changes in the user's movements, for example, combining the scissors hands and the smiles in the facial expressions for combined recognition.
  • the face recognition technology is based on the facial features of a person, and collects a face image or a video stream in a video recording, first determining whether there is a face in the video stream, and if there is a face, Further, the position and size of the face are given, and the position information of each main facial organ is located, and the respective positions and initial forms of the facial features in the face are obtained.
  • the form changes such as the smile
  • the position of the upper and lower lips is generated relative to the initial form.
  • Displacement and deformation indicate that facial expressions of facial features change, and expression changes can also be used to recognize changes in expression.
  • the face recognition of the embodiment of the present application is different from the conventional face recognition, and the conventional face recognition is for constructing
  • the face recognition system recognizes the identity of the user by comparing the recognized face with the known face for identity confirmation and identity lookup.
  • expression recognition In the process of expression recognition, it can be divided into four stages: acquisition and preprocessing of face images; face detection; expression feature extraction; and expression classification. If only through the face recognition and positioning mechanism, there will be inaccuracies, and the expression recognition mechanism is a more accurate processing strategy.
  • Expression recognition is closely related to face recognition. For example, the positioning in face detection and face tracking are similar, but the feature extraction is different. For example, the features extracted by face recognition mainly focus on individual differences and characteristics of different faces, while facial expressions exist as interference signals, so face recognition does not pay much attention to facial expressions.
  • the embodiment of the present application needs to pay attention to the change of the expression to trigger the corresponding second media information, so that the individual difference can be ignored, and the feature extraction of the difference feature of the face in different expression modes can be extracted. It can be combined with individual differences, or individual differences can be treated as interference signals in order to improve the accuracy of expression recognition, that is, not much attention is paid to individual differences.
  • Feature extraction is the core step in facial expression recognition, which determines the final recognition result and affects the recognition rate.
  • the feature extraction can be divided into: static image feature extraction and moving image feature extraction. In terms of static image feature extraction, the deformation features of the expression (or the transient features of the expression) are extracted.
  • Deformation feature extraction for the moving image, not only the table situation change characteristics of each frame are extracted, but also To extract the motion characteristics of a continuous sequence.
  • Deformation feature extraction can rely on neutral expressions or models to compare the generated expressions with neutral expressions to extract deformation features, while the extraction of motion features is directly dependent on the facial changes produced by the expressions.
  • There are many ways to divide expressions 1) If you divide according to basic expressions, such as happiness, sadness, surprise, fear, anger and disgust, create different facial expression image libraries for subsequent matching and recognition. 2) Sort by emotion, such as happy, unpleasant, excited, calm, nervous, relaxed and so on.
  • the key information further includes: text information in the first media information.
  • the detecting unit 22 is further configured to detect the text information in the process of collecting the first media information, and report the text information to the server as key information.
  • the text information in FIG. 5 specifically, the text information "A red fire” identified by A2 is included in the video information. Add a sticker letter as identified by A2’ after the video information is recorded. Interest is "red fire”.
  • the sticker information is manually selected from the server material library through the terminal interaction with the server multiple times, and then the sticker information is attached to the video information that has been recorded.
  • FIG. 6 shows another application scenario of the prior art. Specifically, the text information “boyfriend” identified by A3 is included in the video information. After the video information recording is completed, the sticker information "boyfriend” as identified by A3' is added.
  • the sticker information is manually selected from the server material library through the terminal interaction with the server multiple times, and then the sticker information is attached to the video information that has been recorded. This kind of processing is very cumbersome and requires multiple user interactions.
  • the subsequent search for the sticker is not necessarily what the user really needs. Even if the user really needs it, the user needs to manually add the video information that has already been recorded manually, for example, The sticker moves to the appropriate location of the video information and so on.
  • the video shown in FIG. 7 includes the text information “eat not fat” identified by A4, which is sent to the server as key information.
  • the matching sticker information obtained based on the key information is identified as A4'.
  • the video shown in FIG. 8 includes the text information "boyfriend” identified by A5, which is sent to the server as key information.
  • the matching sticker information obtained based on the key information is identified as A5'.
  • B1 is used to identify the control button during video recording
  • B2 is used to identify the playback button after the video recording is over.
  • a video is synthesized by playing back the sticker information and the video at a suitable position and time point during video recording.
  • FIG. 9 a video is synthesized by playing back the sticker information and the video at a suitable position and time point during video recording.
  • FIG. 10 is a schematic diagram of playing back a video after synthesizing the sticker information and the video at a suitable position and time point after recording the video in the embodiment of the present application, wherein when the video is played in the corresponding recorded video information, At the time of the year-end award, the text information of the corresponding voice can be displayed on the video interface.
  • the composite sticker information is also displayed on the video interface, and the dynamic sticker effect displays “a lot of year-end awards” and is matched with the currency unit.
  • An indicator such as ⁇ , combines it with the text "A lot of year-end awards.”
  • sticker shapes can be obtained by recognizing the facial expression or the user's motion.
  • the voice played in the recorded video is "year-end" as identified by the A6.
  • the user can combine the action and the voice.
  • the user action can be a happy blink of the user.
  • the video interface can also In this "happy blink” time period, other sticker information is also displayed on the video interface, such as the "eyes become two ⁇ " in the A6' logo. In addition to blinking, this can also be user action.
  • the user action triggers the display of "the eyes become two ⁇ " as indicated by A6' in Fig. 11 or the sticker information "a lot of year-end prizes" as shown in FIG.
  • FIG. 12 shows another application example using the embodiment of the present application.
  • other sticker shapes can also be obtained by recognizing facial expressions.
  • the voice played in the corresponding recorded video information is "I am so beautiful" as the A7 logo
  • the position of the cheek of the face is recognized, and the position of the cheek of the face is superimposed as shown by A7'.
  • Sticker information is a red-faced egg, a blush, or a blush in a five-official sticker type.
  • the sticker information is a red-faced egg, a blush, or a blush in a five-official sticker type.
  • the receiving unit 24 is further configured to: receive a description file of the second media information corresponding to the key information that is pushed by the server.
  • the description file includes: a location of the second media information relative to the first media information, and a display time of the second media information.
  • the synthesizing unit 24 is further configured to perform video synthesis on the first media information and the second media information according to the description file, to display the specified file in the description file.
  • the second media information is displayed at a position of the first media information specified by the description file within a time.
  • the synthesizing unit 24 includes two specific implementations:
  • the first specific implementation is: when the change of the expression or the change of the user action is performed, acquiring a corresponding feature detection result, and the second media information is according to the feature detection result and the description file of the second media information And configuring video synthesis with the first media information, and displaying the second media information in a location specified by the first media information and within a specified time point or time period.
  • the second specific implementation is: when the text information is responsive, the second media information is video-combined with the first media information according to the configuration of the description file of the second media information, and the second media information is displayed. At a location specified by the first media information and a specified time point or time period.
  • the second multimedia information includes at least one of the following categories:
  • the second type of sticker information is triggered by the exclusion of the expression change or the user action change.
  • the synthesizing unit 24 is further configured to:
  • the synthesizing unit is further configured to:
  • the above terminal may be an electronic device such as a PC, and may also be a portable electronic device such as a PAD, a tablet computer, a laptop computer, or an intelligent mobile terminal such as a mobile phone, and is not limited to the description herein;
  • the server may be configured by a cluster system, and is integrated into one or each unit function split electronic device for realizing each unit function, and both the terminal and the server include at least a database for storing data and processing for data processing. , or include settings on the server Internal storage medium or independently set storage medium.
  • a microprocessor for the processor for data processing, a microprocessor, a central processing unit (CPU), a digital signal processor (DSP, Digital Singnal Processor) or programmable logic may be used when performing processing.
  • An FPGA Field-Programmable Gate Array
  • An FPGA Field-Programmable Gate Array
  • the operation instruction may be a computer executable code, and the operation instruction is used to implement the information processing method in the foregoing embodiment of the present application.
  • the apparatus includes a processor 41, a storage medium 42, and at least one external communication interface 43; the processor 41, the storage medium 42, and the external communication interface 43 are all connected by a bus 44.
  • the embodiment of the present application is as follows:
  • the text information "A red fire” identified by A2 is included in the video information.
  • the sticker information “Red Fire” as indicated by A2’ is manually selected from the server material library through the terminal interaction with the server multiple times, and then the sticker information is attached to the video information that has been recorded.
  • FIG. 6 shows another application scenario of the prior art. Specifically, the text information “boyfriend” identified by A3 is included in the video information. After the video information recording is completed, the sticker information "boyfriend” as identified by A3' is added.
  • the sticker information is the interaction with the server through the terminal multiple times, from the server element
  • the material library is manually selected, and then the sticker information is attached to the video information that has been recorded. This kind of processing is very cumbersome and requires multiple user interactions.
  • the subsequent search for the sticker is not necessarily what the user really needs. Even if the user really needs it, the user needs to manually add the video information that has already been recorded manually, for example, The sticker moves to the appropriate location of the video information and so on.
  • the existing video processing technology is: the application (APP) will provide some fixed stickers, and then the user first records the video. After recording, the user selects the sticker material library and selects that he thinks relevant. The material, then through complex interactions, determines when and how long each sticker is added. And some APPs allow the sticker to move, then press and hold the sticker, and drag to decide which specific location to move to. The consequence is that multiple tedious interactions between the terminal and the server are required, the processing efficiency is low, and the video recording is manually selected after the video recording is completed. The sticker is finally synthesized, and the video processing cost is high, which wastes time and does not necessarily meet the user's needs.
  • APP application
  • the embodiment of the present application is a video related real-time motion sticker scheme.
  • the face recognition and positioning mechanism, the expression recognition mechanism and the video synthesis processing mechanism of the present application can select a sticker information related to the video material in a pile of materials without requiring the user to perform complicated operations, but select a sticker information.
  • the corresponding place can be seen.
  • the corresponding sticker information appears, which can be called in the corresponding specified position and specified time point in the process of video recording.
  • Superimpose the corresponding sticker information as shown in Figure 7-12.
  • the video information as shown in FIG. 7 includes the text information “eat not fat” identified by A4, which is sent to the server as key information.
  • the matching sticker information obtained based on the key information is identified as A4'.
  • the video as shown in FIG. 8 includes the text information "boyfriend” identified by A5, which is sent as a key information to the server.
  • the matching sticker information obtained based on the key information is identified as A5'.
  • B1 is used to identify the control button during video recording
  • B2 is used to identify the playback button after the video recording is over.
  • a video is synthesized by playing back the sticker information and the video at a suitable position and time point during video recording.
  • FIG. 9 a video is synthesized by playing back the sticker information and the video at a suitable position and time point during video recording.
  • the corresponding voice information is played in the recorded video information.
  • the text information of the corresponding voice can be displayed on the video interface.
  • the composite sticker information displayed on the video interface is displayed in the form of a scroll and the dynamic sticker effect is displayed.
  • the sticker information and the video are synthesized at a suitable position and time point, and then the video is played back, wherein when the corresponding recorded video information is played, the voice is “a lot of year-end awards”.
  • the text interface of the corresponding voice can be displayed on the video interface.
  • the composite sticker information is also displayed on the video interface, and the "year-end prize" is displayed with the dynamic sticker effect, and the indicator of the currency unit is matched. For example, ⁇ , combine it with the text "A lot of year-end awards.”
  • sticker shapes can be obtained by recognizing the facial expression or the user's motion.
  • the user action and the voice can be combined.
  • the user action can be a happy blink of the user.
  • other sticker information such as A6 may also be displayed on the video interface during the "happy blink” period.
  • the eyes of the logo become two ⁇ ".
  • the user action can also be a snap. By this user action, "eyes become two ⁇ " as indicated by A6' in Fig. 11 or "year-end prizes” as shown in Fig. 10 are displayed.
  • FIG. 12 shows another application example using the embodiment of the present application.
  • other sticker shapes can also be obtained by recognizing facial expressions.
  • the voice played in the corresponding recorded video information is "I am so beautiful" as the A7 logo
  • the position of the cheek of the face is recognized, and the position of the cheek of the face is superimposed as shown by A7'.
  • Sticker information is a red-faced egg, a blush, or a blush in a five-official sticker type.
  • the video interface when there is a "I am so beautiful?", there is also a synthetic sticker information on the video interface, and the person's face is flushed.
  • sticker information is divided into the following categories:
  • Trigger-like stickers A set of stickers that appear when a specific action is detected, and a set of stickers that appear can be either a normal sticker or a five-legged sticker;
  • the trigger type sticker and the facial features sticker information are as shown in FIG. 17, and the sticker information is required to be combined with the feature coordinates, and then combined with the recorded video, which needs to be combined with the feature detector in FIG. It has a relationship with the material parser and then with the video synthesizer. This is because the expressions, actions, and facial features change, and the coordinates will change.
  • the sticker information is directly combined with the recorded video. That is to say, only the relationship with the video synthesizer in Fig. 17 occurs because the coordinates usually do not change.
  • the technical implementation also includes the following:
  • the sticker information of each video is taken as part of the material, in the material package, and delivered with the material.
  • the material includes, in addition to the sticker information, a description file of the sticker information and the like.
  • the dynamic material consists of two parts:
  • A) The original form of the sticker information there are three main formats: i) static map; ii) dynamic image interchange format (Gif) map; iii) video.
  • the picture type sticker information files such as static maps and dynamic Gif pictures
  • transparent images such as Portable Network Graphic Format (PNG) pictures to realize video synthesis
  • PNG Portable Network Graphic Format
  • For most video class sticker information files (such as video), because it is not transparent, when using video as a material, the resolution of the material video is twice that of the captured video, and half of the pixels are used to represent the sticker.
  • the RGB value and the other half of the pixel are used to indicate the transparency of the sticker.
  • the video type sticker information is stored in the following manner: RGB and the transparent channel are separated, and the captured video is divided into half of the material RGB and half of the material transparency for storage, as shown in FIG. 16.
  • RGB This color mode is a color standard that is obtained by changing the three color channels of red (R), green (G), and blue (B) and superimposing them on each other.
  • RGB is the color of the three channels of red, green and blue. This standard includes almost all colors that human vision can perceive.
  • the RGB value of the composite video a* RGB value of the video type sticker information + (1-a) * The RGB value of the captured video.
  • the description file contains information: i) the position of the center point where the texture appears; ii) the time when the texture appears. Therefore, according to the sticker information and the sticker description file, the sticker can be actively pushed for the terminal to superimpose the appropriate dynamic sticker at a suitable time position of the video being recorded, without the user manually selecting the sticker.
  • the appearance time of the texture includes: a) for the one-time playback of the dynamic texture, it is necessary to set when to start; b) for the dynamic texture of the repeated play, the start and end time need to be set.
  • the facial features type information includes: i) overhead; ii) eyes; iii) face; iv) mouth; v) nose.
  • Trigger class stickers need to be set: trigger conditions, including: i) open mouth; ii) blink; iii) smile; iv) eyebrows.
  • the dynamic sticker is drawn, and the purpose of real-time visibility has been achieved.
  • a face detection algorithm is also included in the system configuration. It should be noted that the face detection algorithm uses an existing face detection algorithm, and the algorithm itself is not included in the patent.
  • the sticker is drawn at an appropriate position according to the result of the face detection, as shown in FIG. 17 as a structural diagram of the entire system. In one embodiment, the modules in Figure 17 are all located on the terminal side.
  • the sticker information and the feature coordinates are combined, and then combined with the recorded video, that is, it needs to be related to the feature detection and the material parser, and then the video synthesizer has a relationship, because Expressions, movements, changes in facial features, and coordinates will change.
  • the terminal captures the original video through an application (such as a camera application).
  • the terminal detects the face region in each frame image of the original video or the characteristics of the user motion in the finder frame through the feature detector, and analyzes Specific feature parameters and their corresponding feature coordinates.
  • the feature coordinates include the initial coordinates and the target coordinates after the deformation.
  • the sticker is sent to the sticker by the material parser.
  • the description file of the information and the sticker information is parsed to obtain information such as the sticker information and its attributes, the superimposed position and the superimposed time point, and the sticker information is in accordance with the feature coordinates, the superimposed position indicated by the description file of the sticker information, and the superimposed time point and the like.
  • the video synthesizer combines the sticker information with the original video being captured to generate a video processing result containing the sticker information.
  • the sticker information is directly combined with the recorded video, that is: only related to the video synthesizer, because the coordinates usually do not change
  • the terminal captures the original video through an application (such as a camera application).
  • the terminal receives the sticker information and the sticker information sent after the server matches the text information in the video, and then passes the material parser to the sticker.
  • the description file of the information and the sticker information is parsed to obtain information such as the sticker information and its attributes, the superimposed position and the superimposed time point, and the sticker information is synthesized by video according to the superimposed position and the superimposed time point indicated by the description file of the sticker information.
  • the device combines the sticker information with the original video being captured to generate a video processing result containing the sticker information.
  • the disclosed apparatus and method may be implemented in other manners.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • there may be another division manner such as: multiple units or components may be combined, or Can be integrated into another system, or some features can be ignored or not executed.
  • the coupling, or direct coupling, or communication connection of the components shown or discussed may be indirect coupling or communication connection through some interfaces, devices or units, and may be electrical, mechanical or other forms. of.
  • the units described above as separate components may or may not be physically separated, and the components displayed as the unit may or may not be physical units, that is, may be located in one place or distributed to multiple network units; Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may be separately used as one unit, or two or more units may be integrated into one unit; the above integration
  • the unit can be implemented in the form of hardware or Hardware and software functional unit form implementation.
  • the foregoing program may be stored in a computer readable storage medium, and the program is executed when executed.
  • the foregoing storage device includes the following steps: the foregoing storage medium includes: a mobile storage device, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk.
  • ROM read-only memory
  • RAM random access memory
  • magnetic disk or an optical disk.
  • optical disk A medium that can store program code.
  • the above-described integrated unit of the present application may be stored in a computer readable storage medium if it is implemented in the form of a software function module and sold or used as a stand-alone product.
  • the technical solution of the embodiments of the present application may be embodied in the form of a software product in essence or in the form of a software product stored in a storage medium, including a plurality of instructions.
  • a computer device (which may be a personal computer, server, or network device, etc.) is caused to perform all or part of the methods described in various embodiments of the present application.
  • the foregoing storage medium includes various media that can store program codes, such as a mobile storage device, a ROM, a RAM, a magnetic disk, or an optical disk.

Abstract

本申请公开了一种信息处理方法及终端,其中,所述方法包括:终端获取第一操作,以触发第一媒体信息的采集;终端在采集所述第一媒体信息的过程中检测到符合预设条件的人脸区域内的表情变化或采集框内的用户动作变化时,将得到的变化量作为关键信息上报给服务器;终端接收服务器推送的与所述关键信息对应的第二媒体信息;将第一媒体信息和第二媒体信息进行视频合成。

Description

一种信息处理方法及终端
本申请要求于2016年3月14日提交中国专利局,申请号为201610143985.2,发明名称为“一种信息处理方法及终端”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及通讯技术,尤其涉及一种信息处理方法及终端。
背景技术
用户利用终端如手机中的应用来录制视频,将录制的视频进行信息分享,是常见的信息处理方案。录制的视频还可以附加一些与视频内容相关或不相关的其他信息,来得到合成的视频。
要得到该合成的视频,需要执行附加其他信息的操作是非常复杂和繁琐的,需要用户去素材库选择这些信息,比如,用户需要去素材库选择与视频内容中某段视频信息相关的信息。即便选择的该信息是符合相关性需求的,还要进一步考虑:将该信息附加到视频内容中的哪个位置,哪个时间点等等因素,这种交互模式非常复杂,需要多次交互,势必导致处理效率低下,且来回交互往复,处理的时间成本也很高,最终达到的合成视频的效果也可能差强人意,并不符合真实的用户需求,用户可能重新再合成一次。那么,利用终端进行视频合成的信息处理成本会持续增加。然而,相关技术中,对于该问题,尚无有效解决方案。
发明内容
有鉴于此,本申请实施例提供了一种信息处理方法及终端,至少解决了现有技术存在的问题。
根据本申请的一个方面,提供了一种信息处理方法,所述方法包括:
终端获取第一操作,以触发第一媒体信息的采集;
终端在采集所述第一媒体信息的过程中检测到符合预设条件的人脸区域 内的表情变化或采集框内的用户动作变化时,将检测到的表情变化或用户动作变化的变化量作为关键信息上报给服务器;
终端接收所述服务器推送的与所述关键信息对应的第二媒体信息;以及
将第一媒体信息和第二媒体信息进行视频合成。
根据本申请的另一方面,提供了一种终端,所述终端包括:
触发单元,用于获取第一操作,以触发第一媒体信息的采集;
检测单元,用于采集所述第一媒体信息的过程中检测到符合预设条件的人脸区域内的表情变化或采集框内的用户动作变化时,将检测到的表情变化或用户动作变化的变化量作为关键信息上报给服务器;
接收单元,用于接收所述服务器推送的与所述关键信息对应的第二媒体信息;以及
合成单元,用于将第一媒体信息和第二媒体信息进行视频合成。
根据本申请的另一方面,提供了一种非易失性存储介质,存储有程序,当所述非易失性存储介质存储的程序被包括一个或多个处理器的计算机设备执行时,可使所述计算机设备执行如上所述的信息处理方法。
本申请实施例的信息处理方法包括:终端获取第一操作,以触发第一媒体信息的采集;终端在采集所述第一媒体信息的过程中检测到符合预设条件的人脸区域内的表情变化或采集框内的用户动作变化时,将得到的变化量作为关键信息上报给服务器;终端接收服务器推送的与所述关键信息对应的第二媒体信息;将第一媒体信息和第二媒体信息按照预设配置进行视频合成。
采用本申请实施例,实时采集第一媒体信息的过程中,检测到表情变化或用户动作变化时,基于变化量从服务器得到对应的第二媒体信息,将第一媒体信息和第二媒体信息按照预设配置进行视频合成,从而在第一媒体信息采集结束后,重新回放合成的视频。在合成的视频中,在第一媒体信息的指定位置和指定时间显示有对应的第二媒体信息。由于,第二媒体信息无需用户手动选择和添加,因此,简化了操作流程,提高了处理效率;根据采集第一媒体信息的过程中得到的检测结果(如表情变化或用户动作变化)请求索取的对应第二媒体信息也更符合真实的用户需求。通过以上方法,首先第二 媒体信息的内容本身比较精准,其次第二媒体信息出现的位置和时间也能配合上如表情变化或用户动作变化等检测结果,因此,位置和时间点也精准。不仅减少了多次交互,也不需要后续再次调整和重新合成,降低了视频合成的信息处理成本和时间成本。
附图说明
图1为本申请实施例中进行信息交互的各方硬件实体的示意图;
图2为本申请实施例一的一个实现流程示意图;
图3为应用本申请实施例的应用场景示意图;
图4为应用本申请实施例的触发视频录制的示意图;
图5-6为采用现有技术的多个场景示意图;
图7-12为应用本申请实施例的多个场景示意图;
图13为本申请实施例二的一个实现流程示意图;
图14为本申请实施例三的一个组成结构示意图;
图15为本申请实施例四的一个硬件组成结构示意图;
图16为应用本申请实施例的RGB与透明度分开存储的场景示意图;以及
图17为应用本申请实施例的一个实例的系统架构图。
具体实施方式
下面结合附图对技术方案的实施作进一步的详细描述。
图1为本申请实施例中进行信息交互的各方硬件实体的示意图,图1中包括:服务器11和终端设备21,22,23和24,其中终端设备21,22,23和24通过有线网络或者无线网络与服务器进行信息交互。终端设备可以包括手机、台式机、PC机、一体机等。其中,终端设备中安装有满足用户日常和工作所需的多种应用。如果用户喜欢拍照和录视频,会在终端设备中安装诸如图片处理应用,视频处理应用等应用;出于社交分享的需求,也会安装社交应用。此外,还可以将运用图片处理应用和视频处理应用得到的处理结果通过社交应用进行信息分享。采用本申请实施例,基于上述图1所示的系统,终端设备定期从服务器获取各个应用的更新数据包在本地保存,当需要使用 终端设备上的应用,则开启应用(如视频处理应用),获取第一操作,如开启视频录制的操作,从而触发诸如视频的第一媒体信息的采集。终端设备在采集所述第一媒体信息的过程中检测到符合预设条件的人脸区域内的表情变化或采集框内的用户动作变化时,将得到的变化量作为关键信息上报给服务器。例如,该人脸区域内的表情变化可以为微笑,以及用户动作变化可以为眨眼睛或比划剪刀手。终端接收服务器推送的与所述关键信息对应的诸如贴纸的第二媒体信息;将第一媒体信息和第二媒体信息进行视频合成。采用本申请实施例,实时采集第一媒体信息的过程中,检测到表情变化或用户动作变化时,基于变化量从服务器得到对应的第二媒体信息,将第一媒体信息和第二媒体信息进行视频合成,从而在第一媒体信息采集结束后,重新回放合成的视频。在合成的视频中,在第一媒体信息的指定位置和指定时间显示有对应的第二媒体信息。由于,第二媒体信息无需用户手动选择和添加,因此,简化了操作流程,提高了处理效率;根据采集第一媒体信息的过程中得到的检测结果(如表情变化或用户动作变化)请求索取的对应第二媒体信息也更符合真实的用户需求。通过以上方法,首先第二媒体信息的内容本身比较精准,其次第二媒体信息出现的位置和时间也能配合上如表情变化或用户动作变化等检测结果,因此,位置和时间点也精准。不仅减少了多次交互,也不需要后续再次调整和重新合成,降低了视频合成的信息处理成本和时间成本。
上述图1的例子只是实现本申请实施例的一个系统架构实例,本申请实施例并不限于上述图1所述的系统结构,基于该系统架构,提出本申请各个实施例。
实施例一
本申请实施例的信息处理方法,如图2所示,所述方法包括:
步骤101、终端获取第一操作,以触发第一媒体信息的采集。
一个应用场景中,如图3所示,用户躺在沙发上正在使用如手机11的终端设备。手机11的用户界面如图4所示,其中包含各种类型的应用图标,如音乐播放图标,功能设置图标,邮件收发图标等等。用户执行第一操作,如用手指点击A1标识的视频处理应用图标,进入视频录制的处理过程,从而触 发如视频的第一媒体信息的采集。比如,可以录制一段室内的场景,或者给自己进行自拍等等。
步骤102、终端在采集所述第一媒体信息的过程中检测到符合预设条件的人脸区域内的表情变化或采集框内的用户动作变化时,将得到的变化量作为关键信息上报给服务器。
这里,仍然结合步骤101中的应用场景,在视频录制的处理过程中,通过人脸识别定位机制或表情识别机制,终端可以捕获人脸区域内的表情变化,例如,微笑,哭泣,皱眉等等。此外,终端设备还可以检测采集框(或称取景框)内的用户动作变化,例如比剪刀手。这种检测不限于人脸区域。还可以将人脸区域中的表情变化与用户动作变化进行组合识别,比如,将剪刀手和脸部表情中的微笑相结合进行组合识别。
在人脸识别的过程中,人脸识别技术是基于人的脸部特征,对视频录制中的人脸图像或者视频流进行采集,首先判断视频流中是否存在人脸,如果存在人脸,则进一步的给出脸的位置和大小,及定位出各个主要面部器官的位置信息,得到人脸中五官的各自位置和初始形态,当形态发生变化,如微笑时上下嘴唇的位置会相对初始形态产生位移和形变,则说明人脸五官的面部表情出现变化,也可以通过表情识别机制来识别出表情的变化。本申请实施例的人脸识别有别于常规的人脸识别,常规的人脸识别是为了通过构建的人脸识别系统来识别出用户的身份,是将识别出的人脸与已知人脸进行比对,以便于身份确认以及身份查找。
在表情识别过程中,可以分为四个阶段:如人脸图像的获取与预处理;人脸检测;表情特征提取;以及表情分类。如果仅仅通过人脸识别和定位机制,会存在不精确的问题,而表情识别机制是一种更加准确的处理策略。表情识别与人脸识别密切相关,如在人脸检测中的定位和人脸跟踪这些环节上是类似的,但特征提取上不同。举例来说,人脸识别提取的特征主要关注于不同人脸的个体差异和特性,而面部表情作为干扰信号存在,因此人脸识别不过多关注面部表情。而本申请实施例是需要关注表情的变化来触发对应的第二媒体信息,因此可以忽略个体差异,而关注于提取人脸在不同表情模式 下的差异特征的特征提取。其可以与个体差异相结合,也可以为了提高表情识别精度而将个体差异作为干扰信号处理,即不过多关注个体差异。特征提取是人脸表情识别中的核心步骤,决定着最终的识别结果,影响识别率的高低。其中,所述特征提取可以分为:静态图像特征提取和运动图像特征提取。就静态图像特征提取而言,提取的是表情的形变特征(或称为表情的暂态特征),就运动图像特征提取而言,对于运动图像,不仅要提取每一帧的表情形变特征,还要提取连续序列的运动特征。形变特征提取可以依赖中性表情或模型,从而把产生的表情与中性表情做比较来提取出形变特征,而运动特征的提取则直接依赖于表情产生的面部变化。表情有多种划分方式,1)如按照基本表情划分,如高兴、悲伤、惊讶、恐惧、愤怒和厌恶等,建立不同的人脸表情图像库以便后续的匹配和识别。2)按照情绪分类,如愉快,不愉快,激动,平静,紧张,轻松等。
步骤103、终端接收服务器推送的与所述关键信息对应的第二媒体信息。
该步骤的一种具体实现可以为:步骤102将关键信息上报给服务器之后,服务器根据关键信息从素材库匹配对应的第二媒体信息,例如贴纸信息,并将第二媒体信息推送给终端,以便后续在步骤104中与第一媒体信息进行视频合成。无需用户手动选择贴纸信息,而是根据关键信息匹配后自动推送给终端,在终端采集第一媒体信息(如视频)的过程中自动合成(如将视频和贴纸信息相叠加)视频处理结果,在第一媒体信息(如视频)的指定位置和指定时间显示贴纸信息。
步骤104、将第一媒体信息和第二媒体信息进行视频合成。
在本申请实施例一实施方式中,所述关键信息还包括:第一媒体信息中的文字信息。
该信息处理方法还包括:在采集所述第一媒体信息的过程中检测该文字信息,并将其作为关键信息上报给服务器。
现有技术中,如图5中的文字信息,具体的,在视频信息中包含A2所标识的文字信息“红红火火”。在视频信息录制完成后添加如A2’所标识贴纸信息“红红火火”。该贴纸信息是通过终端多次与服务器的交互,从服务器素材 库手工选取的,之后,在将该贴纸信息附加到已经录制完成的视频信息中。
如图6所示为现有技术的另一个应用场景。具体的,在视频信息中包含A3所标识的文字信息“男朋友”。在视频信息录制完成后添加如A3’所标识贴纸信息“男朋友”。该贴纸信息是通过终端多次与服务器的交互,从服务器素材库手工选取的,之后,在将该贴纸信息附具到已经录制完成的视频信息中。这种处理,非常繁琐,需要多次用户交互,后续找的贴纸也不一定就是用户真正需要的,即便是用户真正需要的,也需要用户手工在已经录制完成的视频信息上手动添加,比如将贴纸移动到视频信息的合适位置上等等。
而采用本申请实施例,如图7所示的视频中包括A4所标识的文字信息“吃不胖”,其作为关键信息被发送到服务器。基于该关键信息得到的匹配贴纸信息如A4’所标识。如图8所示的视频中包括A5所标识的文字信息“男朋友”,其作为关键信息被发送个服务器。基于该关键信息得到的匹配贴纸信息如A5’所标识。在图7和8中,B1用于标识视频录制过程中的控制按钮,B2用于标识视频录制结束后的回放按钮。如图9所示为一个录制视频过程中将贴纸信息与视频在合适的位置和时间点进行视频合成后,回放该视频的示意图。在图9中,对应录制的视频信息中播放语音为“祝姐妹们过年吃不胖”时,在视频界面可以显示对应语音的文字信息,在这个时间点,在视频界面上还显示有合成的贴纸信息以卷轴形式打开并显示的动态贴纸效果“吃不胖”。图10为另一个采用本申请实施例录制视频后,将贴纸信息与视频在合适的位置和时间点进行视频合成后,回放该视频的示意图,其中,当对应录制的视频信息中播放语音为“年终奖多多”时,在视频界面可以显示对应语音的文字信息,在这个时间点,在视频界面上还显示有合成的贴纸信息,以动态贴纸效果显示“年终奖多多”,并配合有货币单位的指示符,例如¥,将其与“年终奖多多”的文字相结合。
当录制的视频中有文字信息“年终奖多多”时,除了如图10所示的显示对应内容的贴纸信息之外,还可以通过识别人脸表情或用户动作而得到其他的贴纸形态。如图11所示,当录制的视频信息中播放语音为如A6标识的“年终奖多多”时,可以使用户动作和语音相结合。例如,该用户动作可以为用 户开心的眨眼。在这种情况下,在视频界面除了可以显示如图10所示的贴纸信息之外,还可以在这个“开心的眨眼”的时间段,在视频界面上还显示有其他的贴纸信息,如A6’标识的“眼睛变成两个¥”。除了眨眼之外,该用户动作还可以为打响指。通过该用户动作触发显示如图11中的A6’标识的“眼睛变成两个¥”或者显示如图10所示的贴纸信息“年终奖多多”。
图12所示为另一个采用本申请实施例的应用实例。在图12中,还可以通过识别人脸表情得到其他的贴纸形态。如图12所示,当对应录制的视频信息中播放语音为如A7标识的“我真有这么漂亮吗”时,识别出人脸脸颊的位置,在人脸脸颊的位置叠加如A7’所示的贴纸信息。具体地,该贴纸信息为五官贴纸类型中的红脸蛋、腮红或红晕。那么,在视频界面在出现“我真有这么漂亮吗”的时间段,在视频界面上还显示有合成的贴纸信息,人脸上有红晕。
实施例二
本申请实施例的信息处理方法,如图13所示,所述方法包括:
步骤201、终端开启应用,获取第一操作,触发第一媒体信息的采集。
一个应用场景中,如图3所示,用户躺在沙发上正在使用如手机11的终端设备。手机11的用户界面如图4所示,其中包含各种类型的应用图标,如音乐播放图标,功能设置图标,邮件收发图标等等。用户执行第一操作,如用手指点击A1标识的视频处理应用图标,进入视频录制的处理过程,从而触发如视频的第一媒体信息的采集。比如,可以录制一段室内的场景,或者给自己进行自拍等等。
步骤202、终端在采集所述第一媒体信息的过程中检测到符合预设条件的人脸区域内的表情变化或采集框内的用户动作变化时,将得到的变化量作为关键信息上报给服务器。
这里,仍然结合步骤201中的应用场景,在视频录制的处理过程中,通过人脸识别定位机制,表情识别机制,终端设备可以捕获人脸区域内的表情变化,如微笑,哭泣,皱眉等等。此外,终端设备还可以检测采集框(或称 取景框)内的用户动作变化,例如比剪刀手。这种检测不限于人脸区域。还可以将人脸区域中的表情变化与用户动作变化进行组合识别,比如,将剪刀手和脸部表情中的微笑相结合进行组合识别。
在人脸识别的过程中,人脸识别技术是基于人的脸部特征,对视频录制中的人脸图像或者视频流进行采集,首先判断视频流中是否存在人脸,如果存在人脸,则进一步的给出脸的位置和大小,及定位出各个主要面部器官的位置信息,得到人脸中五官的各自位置和初始形态,当形态发生变化,如微笑时上下嘴唇的位置会相对初始形态产生位移和形变,则说明人脸五官的面部表情出现变化,也可以通过表情识别机制来识别出表情的变化。本申请实施例的人脸识别有别于常规的人脸识别,常规的人脸识别是为了通过构建的人脸识别系统来识别出用户的身份,是将识别出的人脸与已知人脸进行比对,以便于身份确认以及身份查找。
在表情识别过程中,可以分为四个阶段:如人脸图像的获取与预处理;人脸检测;表情特征提取;以及表情分类。如果仅仅通过人脸识别和定位机制,会存在不精确的问题,而表情识别机制是一种更加准确的处理策略,表情识别与人脸识别密切相关,如在人脸检测中的定位和人脸跟踪这些环节上是类似的,但特征提取上不同。举例来说,人脸识别提取的特征主要关注于不同人脸的个体差异和特性,而面部表情作为干扰信号存在。也就是说,不过多关注面部表情。而本申请实施例是需要关注表情的变化来触发对应的第二媒体信息,因此,可以忽略个体差异,而关注于提取人脸在不同表情模式下的差异特征的特征提取。其可以与个体差异相结合,也可以为了提高表情识别精度而将个体差异作为干扰信号处理,即不过多关注个体差异。特征提取是人脸表情识别中的核心步骤,决定着最终的识别结果,影响识别率的高低。其中,所述特征提取可以分为:静态图像特征提取和运动图像特征提取。就静态图像特征提取而言,提取的是表情的形变特征(或称为表情的暂态特征),就运动图像特征提取而言,对于运动图像,不仅要提取每一帧的表情形变特征,还要提取连续序列的运动特征。形变特征提取可以依赖中性表情或模型,从而把产生的表情与中性表情做比较来提取出形变特征,而运动特征 的提取则直接依赖于表情产生的面部变化。表情有多种划分方式,1)如按照基本表情划分,如高兴、悲伤、惊讶、恐惧、愤怒和厌恶等,建立不同的人脸表情图像库以便后续的匹配和识别。2)按照情绪分类,如愉快,不愉快,激动,平静,紧张,轻松等。
步骤203、服务器从素材库中选取与关键信息对应的第二媒体信息和第二媒体信息的描述文件。
步骤204、终端接收服务器推送的与所述关键信息对应的第二媒体信息和第二媒体信息的描述文件。
该步骤的一种具体实现可以为:在步骤202中将关键信息上报给服务器之后,服务器根据关键信息从素材库匹配对应的第二媒体信息,例如贴纸信息,并将该第二媒体信息推送给终端,以便后续在步骤205中与第一媒体信息进行视频合成。无需用户手动选择贴纸信息,而是根据关键信息匹配后自动推送给终端,在终端采集第一媒体信息(如视频)的过程中自动合成(如将视频和贴纸信息相叠加)视频处理结果,在第一媒体信息(如视频)的指定位置和指定时间显示贴纸信息。
这里,步骤204中,与所述关键信息对应的第二媒体信息的描述文件和第二媒体信息可以同时发送或者分别发送,取决于当时的网络状况,如果网络状况好,则同时发送,如果网络状况不好,为了避免网络不好,丢失数据,可以分别发放。
步骤205、将第一媒体信息和第二媒体信息进行视频合成。
在本申请实施例一实施方式中,所述关键信息还包括:第一媒体信息中的文字信息。
因此该方法还包括:在采集所述第一媒体信息的过程中检测该文字信息,并将其作为关键信息上报给服务器。
现有技术中,如图5中的文字信息,具体的,在视频信息中包含A2所标识的文字信息“红红火火”。在视频信息录制完成后添加如A2’所标识贴纸信息“红红火火”。该贴纸信息是通过终端多次与服务器的交互,从服务器素材库手工选取的,之后,在将该贴纸信息附加到已经录制完成的视频信息中。
如图6所示为现有技术的另一个应用场景,具体的,在视频信息中包含A3所标识的文字信息“男朋友”。在视频信息录制完成后添加如A3’所标识贴纸信息“男朋友”。该贴纸信息是通过终端多次与服务器的交互,从服务器素材库手工选取的,之后,在将该贴纸信息附加到已经录制完成的视频信息中。这种处理,非常繁琐,需要多次用户交互,后续找的贴纸也不一定就是用户真正需要的,即便是用户真正需要的,也需要用户手工在已经录制完成的视频信息上手动添加,比如将贴纸移动到视频信息的合适位置上等等。
而采用本申请实施例,如图7所示的视频中包括A4所标识的文字信息“吃不胖”,其作为关键信息被发送到服务器。基于该关键信息得到的匹配贴纸信息如A4’所标识。如图8所示的视频中包括A5所标识的文字信息“吃不胖”,其作为关键信息被发送到服务器。基于该关键信息得到的匹配贴纸信息如A5’所标识。在图7和8中,B1用于标识视频录制过程中的控制按钮,B2用于标识视频录制结束后的回放按钮。如图9所示为一个录制视频过程中将贴纸信息与视频在合适的位置和时间点进行视频合成后,回放该视频的示意图。在图9中,对应录制的视频信息中播放语音为“祝姐妹们过年吃不胖”时,在视频界面可以显示对应语音的文字信息,在这个时间点,在视频界面上还显示有合成的贴纸信息以卷轴形式打开并显示的动态贴纸效果“吃不胖”。图10为另一个采用本申请实施例录制视频后,将贴纸信息与视频在合适的位置和时间点进行视频合成后,回放该视频的示意图,其中,当对应录制的视频信息中播放语音为“年终奖多多”时,在视频界面可以显示对应语音的文字信息,在这个时间点,在视频界面上还显示有合成的贴纸信息,以动态贴纸效果显示“年终奖多多”,并配合有货币单位的指示符,例如¥,将其与“年终奖多多”的文字相结合。
当录制的视频中有文字信息“年终奖多多”时,除了如图10所示的显示对应内容的贴纸信息之外,还可以通过识别人脸表情或用户动作而得到其他的贴纸形态。如图11所示,当录制的视频中播放语音为如A6标识的“年终奖多多”时,可以使用户动作和语音相结合。例如,该用户动作可以为用户开心的眨眼。在这种情况下,在视频界面除了可以如图10所示的显示对应的 文字信息之外,还可以在这个“开心的眨眼”的时间段,在视频界面上还显示有其他的贴纸信息,如A6’标识的“眼睛变成两个¥”。除了眨眼之外,该用户动作该可以是打响指。通过该用户动作触发显示如图11中的A6’标识的“眼睛变成两个¥”或者如图10所示的贴纸信息“年终奖多多”。
图12所示为另一个采用本申请实施例的应用实例。在图12中,还可以通过识别人脸表情得到其他的贴纸形态。如图12所示,当对应录制的视频信息中播放语音为如A7标识的“我真有这么漂亮吗”时,识别出人脸脸颊的位置,在人脸脸颊的位置叠加如A7’所示的贴纸信息。具体地,该贴纸信息为五官贴纸类型中的红脸蛋、腮红或红晕。那么,在视频界面在出现“我真有这么漂亮吗”的时间段,在视频界面上还显示有合成的贴纸信息,人脸上有红晕。
在本申请实施例一实施方式中,所述将第一媒体信息和第二媒体信息进行视频合成,包括:
第一种实现方案:响应所述表情变化或所述用户动作变化,获取对应的特征检测结果,将所述第二媒体信息按照所述特征检测结果和所述第二媒体信息的描述文件的配置与第一媒体信息进行视频合成,并在指定时间点或时间段内将所述第二媒体信息显示在所述第一媒体信息指定的位置处。
第二种实现方案:响应所述文字信息,将所述第二媒体信息按照所述第二媒体信息的描述文件的配置与第一媒体信息进行视频合成,并在指定时间点或时间段内将所述第二媒体信息显示在所述第一媒体信息指定的位置处。
两种方案的区别在于:第一种方案,需要得到特征坐标(特征检测结果中的部分信息或全部信息),以便结合特征坐标,确定将贴纸信息放到视频信息中哪个合适的指定位置,第二媒体信息可以决定时间点,贴纸信息的摆放是有固定位置和固定时间要求的,根据这个指定位置和时间点,就可以实现在合适的位置,合适的时间点叠加贴纸信息到视频信息上,比如,如图12所示的“脸上出现腮红”这种场景;第二种方案中,可以不考虑特征坐标,第二媒体信息可以决定时间点,和贴纸的大小等属性,还可以包括贴纸信息的中心点位置,贴纸的摆放虽然也有固定位置和固定时间要求的,但是,相比 于第一种方案,更具有任意性,如图7所示,只要出现“吃不胖”就行,不限定“吃不胖”一定在人脸区域的哪个相对位置显示,而第一种方案中,是由表情变化或用户动作变化触发的贴纸请求,因此,务必要配合表情变化或用户动作变化来显示。
在本申请实施例一实施方式中,所述第二多媒体信息包括以下至少一类:1)由所述表情变化或所述用户动作变化触发显示的第一类贴纸信息,如五官贴纸和触发类贴纸;2)由排除所述表情变化或所述用户动作变化触发显示的第二类贴纸信息普通贴纸和背景贴纸。
在本申请实施例一实施方式中,所述响应所述表情变化或所述用户动作变化时,获取对应的特征检测结果,将所述第二媒体信息按照所述特征检测结果和所述第二媒体信息的描述文件的配置与第一媒体信息进行视频合成包括:
a1、响应所述表情变化或所述用户动作变化,将检测到的特征变化量上报服务器,以请求所述第一类贴纸信息和第一类贴纸信息的描述文件;
a2、检测所述表情变化或所述用户动作变化引起的特征坐标变化,由初始坐标定位到目标坐标,以根据目标坐标定位得到的位置点或者由初始坐标至目标坐标界定的位置区域来确定叠加所述第一类贴纸信息的位置;
a3、解析收到的所述第一类贴纸信息的描述文件,得到第一类贴纸信息的显示时间;
a4、按照所述确定的位置以及所述解析的第一类贴纸信息的显示时间,将第二媒体信息与第一媒体信息进行视频合成。
在本申请实施例一实施方式中,所述响应所述文字信息时,将所述第二媒体信息按照所述第二媒体信息的描述文件的配置与第一媒体信息进行视频合成,包括:
b1、响应所述文字信息,将检测到的文字信息上报服务器,以请求所述第二类贴纸信息和第二类贴纸信息的描述文件;
b2、解析收到的所述第二类贴纸信息的描述文件,得到第二类贴纸信息相对于第一媒体信息的位置,以及第二类贴纸信息的显示时间,其中,所述 位置包括第二类贴纸信息显示的中心点位置;
b3、按照所述得到的位置和所述显示时间,将第二媒体信息与第一媒体信息进行视频合成。
实施例三
根据本申请的实施例提供了一种终端。如图14所示,所述终端包括:触发单元21,用于获取第一操作,以触发第一媒体信息的采集;检测单元22,用于在采集所述第一媒体信息的过程中检测人脸区域内的表情变化或采集框内的用户动作变化时,将得到的变化量作为关键信息上报给服务器;及接收单元23,用于接收服务器推送的与所述关键信息对应的第二媒体信息;及合成单元24,用于将第一媒体信息和第二媒体信息进行视频合成。
一个应用场景中,如图3所示,用户躺在沙发上正在使用如手机11的终端设备。手机11的用户界面如图4所示,其中包含各种类型的应用图标,如音乐播放图标,功能设置图标,邮件收发图标等等。用户执行第一操作,如用手指点击A1标识的视频处理应用图标,进入视频录制的处理过程,从而触发如视频的第一媒体信息的采集。比如,可以录制一段室内的场景,或者给自己进行自拍等等。在视频录制的处理过程中,通过人脸识别定位机制或表情识别机制,终端可以捕获到人脸区域内的表情变化,例如微笑,哭泣,皱眉等等。此外,终端设备还可以检测采集框(或称取景框)内的用户动作变化,例如比剪刀手。还可以将人脸区域中的表情变化与用户动作变化进行组合识别,比如,将剪刀手和脸部表情中的微笑相结合进行组合识别。
在人脸识别的过程中,人脸识别技术是基于人的脸部特征,对视频录制中的人脸图像或者视频流进行采集,首先判断视频流中是否存在人脸,如果存在人脸,则进一步的给出脸的位置和大小,及定位出各个主要面部器官的位置信息,得到人脸中五官的各自位置和初始形态,当形态发生变化,如微笑时上下嘴唇的位置会相对初始形态产生位移和形变,则说明人脸五官的面部表情出现变化,也可以通过表情识别机制来识别出表情的变化。本申请实施例的人脸识别有别于常规的人脸识别,常规的人脸识别是为了通过构建的 人脸识别系统来识别出用户的身份,是将识别出的人脸与已知人脸进行比对,以便于身份确认以及身份查找。
在表情识别过程中,可以分为四个阶段:如人脸图像的获取与预处理;人脸检测;表情特征提取;和表情分类。如果仅仅通过人脸识别和定位机制,会存在不精确的问题,而表情识别机制是一种更加准确的处理策略。表情识别与人脸识别密切相关,如在人脸检测中的定位和人脸跟踪这些环节上是类似的,但特征提取上不同。举例来说,人脸识别提取的特征主要关注于不同人脸的个体差异和特性,而面部表情作为干扰信号存在,,因此人脸识别不过多关注面部表情。而本申请实施例是需要关注表情的变化来触发对应的第二媒体信息,因此可以忽略个体差异,而关注于提取人脸在不同表情模式下的差异特征的特征提取。其可以与个体差异相结合,也可以为了提高表情识别精度而将个体差异作为干扰信号处理,即不过多关注个体差异。特征提取是人脸表情识别中的核心步骤,决定着最终的识别结果,影响识别率的高低。其中,所述特征提取可以分为:静态图像特征提取和运动图像特征提取。就静态图像特征提取而言,提取的是表情的形变特征(或称为表情的暂态特征),就运动图像特征提取而言,对于运动图像,不仅要提取每一帧的表情形变特征,还要提取连续序列的运动特征。形变特征提取可以依赖中性表情或模型,从而把产生的表情与中性表情做比较来提取出形变特征,而运动特征的提取则直接依赖于表情产生的面部变化。表情有多种划分方式,1)如按照基本表情划分,如高兴、悲伤、惊讶、恐惧、愤怒和厌恶等,建立不同的人脸表情图像库以便后续的匹配和识别。2)按照情绪分类,如愉快,不愉快,激动,平静,紧张,轻松等。
在本申请实施例一实施方式中,所述关键信息还包括:第一媒体信息中的文字信息。
检测单元22,还用于在采集所述第一媒体信息的过程中检测所述文字信息,并将所述文字信息作为关键信息上报给所述服务器。
现有技术中,如图5中的文字信息,具体的,在视频信息中包含A2所标识的文字信息“红红火火”。在视频信息录制完成后添加如A2’所标识贴纸信 息“红红火火”。该贴纸信息是通过终端多次与服务器的交互,从服务器素材库手工选取的,之后,在将该贴纸信息附加到已经录制完成的视频信息中。
如图6所示为现有技术的另一个应用场景。具体的,在视频信息中包含A3所标识的文字信息“男朋友”。在视频信息录制完成后添加如A3’所标识贴纸信息“男朋友”。该贴纸信息是通过终端多次与服务器的交互,从服务器素材库手工选取的,之后,在将该贴纸信息附具到已经录制完成的视频信息中。这种处理,非常繁琐,需要多次用户交互,后续找的贴纸也不一定就是用户真正需要的,即便是用户真正需要的,也需要用户手工在已经录制完成的视频信息上手动添加,比如将贴纸移动到视频信息的合适位置上等等。
而采用本申请实施例,如图7所示的视频中包括A4所标识的文字信息“吃不胖”,其作为关键信息被发送到服务器。基于该关键信息得到的匹配贴纸信息如A4’所标识。如图8所示的视频中包括A5所标识的文字信息“男朋友”,其作为关键信息被发送到服务器。基于该关键信息得到的匹配贴纸信息如A5’所标识。在图7和8中,B1用于标识视频录制过程中的控制按钮,B2用于标识视频录制结束后的回放按钮。如图9所示为一个录制视频过程中将贴纸信息与视频在合适的位置和时间点进行视频合成后,回放该视频的示意图,在图9中,对应录制的视频信息中播放语音为“祝姐妹们过年吃不胖”时,在视频界面可以显示对应语音的文字信息,在这个时间点,在视频界面上还显示有合成的贴纸信息以卷轴形式打开并显示的动态贴纸效果“吃不胖”。图10为另一个采用本申请实施例录制视频后,将贴纸信息与视频在合适的位置和时间点进行视频合成后,回放该视频的示意图,其中,当对应录制的视频信息中播放语音为“年终奖多多”时,在视频界面可以显示对应语音的文字信息,在这个时间点,在视频界面上还显示有合成的贴纸信息,以动态贴纸效果显示“年终奖多多”,并配合有货币单位的指示符,例如¥,将其与“年终奖多多”的文字相结合。
当录制的视频中有文字信息“年终奖多多”时,除了如图10所示的显示对应内容的贴纸信息之外,还可以通过识别人脸表情或用户动作而得到其他的贴纸形态。如图11所示,当录制的视频中播放语音为如A6标识的“年终 奖多多”时可以使用户动作和语音相结合。例如,该用户动作可以为用户开心的眨眼。在这种情况下,在视频界面除了可以如图10所示的显示贴纸信息之外,还可以在这个“开心的眨眼”的时间段,在视频界面上还显示有其他的贴纸信息,如A6’标识的“眼睛变成两个¥”。除了眨眼之外,该还可以是用户动作还可以为打响指。通过该用户动作触发显示如图11中的A6’标识的“眼睛变成两个¥”或者如图10所示的贴纸信息“年终奖多多”。
图12所示为另一个采用本申请实施例的应用实例。在图12中,还可以通过识别人脸表情得到其他的贴纸形态。如图12所示,当对应录制的视频信息中播放语音为如A7标识的“我真有这么漂亮吗”时,识别出人脸脸颊的位置,在人脸脸颊的位置叠加如A7’所示的贴纸信息。具体地,该贴纸信息为五官贴纸类型中的红脸蛋、腮红或红晕。,那么,在视频界面在出现“我真有这么漂亮吗”的时间段,在视频界面上还显示有合成的贴纸信息,人脸上有红晕。
在本申请实施例一实施方式中,所述接收单元24,进一步用于:接收服务器推送的与所述关键信息对应的第二媒体信息的描述文件。
所述描述文件包括:所述第二媒体信息相对于第一媒体信息的位置,以及第二媒体信息的显示时间。
在本申请实施例一实施方式中,所述合成单元24进一步用于根据所述描述文件将所述第一媒体信息与所述第二媒体信息进行视频合成,以在所述描述文件指定的显示时间内将所述第二媒体信息显示在所述描述文件指定的所述第一媒体信息的位置处。具体地,合成单元24包括两种具体实现:
第一种具体实现:响应所述表情变化或所述用户动作变化时,获取对应的特征检测结果,将所述第二媒体信息按照所述特征检测结果和所述第二媒体信息的描述文件的配置与第一媒体信息进行视频合成,并将所述第二媒体信息显示在所述第一媒体信息指定的位置和指定时间点或时间段内。
第二种具体实现:响应所述文字信息时,将所述第二媒体信息按照所述第二媒体信息的描述文件的配置与第一媒体信息进行视频合成,并将所述第二媒体信息显示在所述第一媒体信息指定的位置和指定时间点或时间段内。
在本申请实施例一实施方式中,所述第二多媒体信息包括以下至少一类:
由所述表情变化或所述用户动作变化触发显示的第一类贴纸信息;
由排除所述表情变化或所述用户动作变化触发显示的第二类贴纸信息。
在本申请实施例一实施方式中,所述合成单元24,进一步用于:
响应所述表情变化或所述用户动作变化,将检测到的特征变化量上报服务器,以请求所述第一类贴纸信息和第一类贴纸信息的描述文件;
检测所述表情变化或所述用户动作变化引起的特征坐标变化,由初始坐标定位到目标坐标,以根据目标坐标定位得到的位置点或者由初始坐标至目标坐标界定的位置区域来叠加所述第一类贴纸信息;
解析收到的所述第一类贴纸信息的描述文件,得到第一类贴纸信息待显示的指定时间;
按照所述位置点或所述位置区域所指定的位置和所述指定时间,将第二媒体信息与第一媒体信息进行视频合成。
在本申请实施例一实施方式中,所述合成单元,进一步用于:
响应所述文字信息,将检测到的文字信息上报服务器后,以请求所述第二类贴纸信息和第二类贴纸信息的描述文件;
解析收到的所述第二类贴纸信息的描述文件,得到第一类贴纸信息待显示的指定位置和指定时间;所述指定位置包括第一类贴纸信息显示的中心点位置;
按照所述指定位置和所述指定时间,将第二媒体信息与第一媒体信息进行视频合成。
实施例四
这里需要指出的是,上述终端可以为PC这种电子设备,还可以为如PAD,平板电脑,手提电脑这种便携电子设备、还可以为如手机这种智能移动终端,不限于这里的描述;所述服务器可以是通过集群系统构成的,为实现各单元功能而合并为一或各单元功能分体设置的电子设备,终端和服务器都至少包括用于存储数据的数据库和用于数据处理的处理器,或者包括设置于服务器 内的存储介质或独立设置的存储介质。
其中,对于用于数据处理的处理器而言,在执行处理时,可以采用微处理器、中央处理器(CPU,Central Processing Unit)、数字信号处理器(DSP,Digital Singnal Processor)或可编程逻辑阵列(FPGA,Field-Programmable Gate Array)实现;对于存储介质来说,包含操作指令,该操作指令可以为计算机可执行代码,通过所述操作指令来实现上述本申请实施例信息处理方法流程中的各个步骤。
该终端和该服务器作为硬件实体S11的一个示例如图15所示。所述装置包括处理器41、存储介质42以及至少一个外部通信接口43;所述处理器41、存储介质42以及外部通信接口43均通过总线44连接。
这里需要指出的是:以上涉及终端和服务器项的描述,与上述方法描述是类似的,同方法的有益效果描述,不做赘述。对于本申请终端和服务器实施例中未披露的技术细节,请参照本申请方法实施例的描述。
以一个现实应用场景为例对本申请实施例阐述如下:
首先对应用场景进行介绍:一,在视频素材中,根据素材演绎的内容,经常会要加一些相关联的动态贴纸,让视频显得更加丰富。A)比如:拜年视频中,当表达恭喜发财时,会希望有一些从天而降的金币;B)又比如:当视频内容想发表主角娇羞状态时,会希望能在用户脸上加上红晕的特效。二,视频中会有和某个明星合演的需求,这个时候也可以将明星直接做成前景的贴纸,然后让用户和明星合影。现有技术中,经视频处理技术得到的示意图如图5-6所示。其中,如图5中的文字信息,具体的,在视频信息中包含A2所标识的文字信息“红红火火”。在视频信息录制完成后添加如A2’所标识贴纸信息“红红火火”。该贴纸信息是通过终端多次与服务器的交互,从服务器素材库手工选取的,之后,在将该贴纸信息附具到已经录制完成的视频信息中。
如图6所示为现有技术的另一个应用场景。具体的,在视频信息中包含A3所标识的文字信息“男朋友”。在视频信息录制完成后添加如A3’所标识贴纸信息“男朋友”。该贴纸信息是通过终端多次与服务器的交互,从服务器素 材库手工选取的,之后,在将该贴纸信息附具到已经录制完成的视频信息中。这种处理,非常繁琐,需要多次用户交互,后续找的贴纸也不一定就是用户真正需要的,即便是用户真正需要的,也需要用户手工在已经录制完成的视频信息上手动添加,比如将贴纸移动到视频信息的合适位置上等等。
上述图5-6的场景中,采用的现有视频处理技术为:应用(APP)会提供一些固定的贴纸,然后用户先录制视频,录制后,由用户自己去贴纸素材库里选择自己认为相关素材,然后通过复杂的交互,决定每张贴纸在什么时间点加上,加多久。且有些APP允许贴纸移动,然后要按住贴纸,拖动决定移动到哪个具体的位置,这样的后果是:需要终端与服务器间多次繁琐的交互,处理效率低下,在视频录制完成后手动选择贴纸再最终合成,视频处理成本高,浪费时间,还不一定符合用户需求。
针对上述应用场景,采用本申请实施例,是一种视频相关实时动效贴纸方案。采用本申请的人脸识别及定位机制,表情识别机制,视频合成处理机制,可以不需要用户通过复杂的操作,去一堆素材中选择某个和该视频素材相关的贴纸信息,而是选择了某段素材视频,在录制的过程中,就能看到对应的地方,在对应的时间,出现对应的贴纸信息,可以称为在视频录制的过程中就实时的在相应指定位置和指定时间点叠加上对应的贴纸信息,如图7-12所示。
采用本申请实施例,如图7所述的视频中包括A4所标识的文字信息“吃不胖”,其作为关键信息被发送到服务器。基于该关键信息得到的匹配贴纸信息如A4’所标识。如图8所述的视频中包括A5所标识的文字信息“男朋友”,其作为关键信息被发送个服务器。基于该关键信息得到的匹配贴纸信息如A5’所标识。在图7-8中,B1用于标识视频录制过程中的控制按钮,B2用于标识视频录制结束后的回放按钮。如图9所示为一个录制视频过程中将贴纸信息与视频在合适的位置和时间点进行视频合成后,回放该视频的示意图,在图9中,对应录制的视频信息中播放语音为“祝姐妹们过年吃不胖”时,在视频界面可以显示对应语音的文字信息,在这个时间点,在视频界面上还显示有合成的贴纸信息以卷轴形式打开并显示的动态贴纸效果“吃不胖”。图10为 另一个采用本申请实施例录制视频后,将贴纸信息与视频在合适的位置和时间点进行视频合成后,回放该视频的示意图,其中,当对应录制的视频信息中播放语音为“年终奖多多”时,在视频界面可以显示对应语音的文字信息,在这个时间点,在视频界面上还显示有合成的贴纸信息,以动态贴纸效果显示“年终奖多多”,并配合有货币单位的指示符,例如¥,将其与“年终奖多多”的文字相结合。
当录制的视频中有文字信息“年终奖多多”时,除了如图10所示的显示对应内容的贴纸信息之外,还可以通过识别人脸表情或用户动作而得到其他的贴纸形态。如图11所示,当录制的视频信息中播放语音为如A6标识的“年终奖多多”时可以使用户动作和语音相结合。例如,该用户动作可以为用户开心的眨眼。在这种情况下,在视频界面除了可以显示如图10所示的贴纸信息之外,还可以在这个“开心的眨眼”的时间段,在视频界面上还显示有其他的贴纸信息,如A6’标识的“眼睛变成两个¥”。除了眨眼之外,该用户动作还可以为打响指。通过该用户动作触发显示如图11中的A6’标识的“眼睛变成两个¥”或者如图10所示的“年终奖多多”。
图12所示为另一个采用本申请实施例的应用实例。在图12中,还可以通过识别人脸表情得到其他的贴纸形态。如图12所示,当对应录制的视频信息中播放语音为如A7标识的“我真有这么漂亮吗”时,识别出人脸脸颊的位置,在人脸脸颊的位置叠加如A7’所示的贴纸信息。具体地,该贴纸信息为五官贴纸类型中的红脸蛋、腮红或红晕。那么,在视频界面在出现“我真有这么漂亮吗”的时间段,在视频界面上还显示有合成的贴纸信息,人脸上有红晕。
在录制完时,对应的贴纸信息也已经出现在视频里了。
这里需要指出的是:贴纸信息分为以下几种:
A)普通贴纸:比如天上掉下的金币,抖动的红包,盛开的一朵小花都属于这类贴纸;
B)五官贴纸:可以指定出现在五官某个具体的位置,并且会跟随五官移动的贴纸。如:脸蛋的红晕,眼镜等;
C)触发类贴纸:检测到某个具体的动作时,出现的变化的一组贴纸,出现的一组贴纸既可以是普通贴纸,也可以是五官贴纸;以及
D)背景贴纸:盖在视频最上方,并且重复播放的几帧视频,如图8中“吃不胖”的边框,如边框类贴纸。
对于上述贴纸信息的四种类型,触发类贴纸和五官类贴纸信息是如图17所示,需要贴纸信息与特征坐标相结合,之后再与录制视频合成,其需要与图17中的特征检测器和素材解析器发生关系,再与视频合成器发生关系,这是因为表情,动作,五官变化这种,坐标都是会变化的。而除去触发类贴纸和五官类贴纸信息之外,其他几种(普通贴纸和背景贴纸),都是贴纸信息直接与录制视频合成。也就是说,仅与图17中的视频合成器发生关系,因为坐标通常不会变。
要想实现上述图7-12所示的最终效果,在技术实现上,还包括以下内容:
一,将每个视频的贴纸信息作为素材的一部分,在素材包中,并且随素材下发。所述素材除了包括贴纸信息,还包括贴纸信息的描述文件等等。
二,动态的素材包括两部分:
A)贴纸信息的原始形态,主要有三种格式:i)静态贴图;ii)动态的图像互换格式(Graphics Interchange Format,Gif)图;iii)视频。其中,对于图片类的贴纸信息文件(如静态贴图和动态的Gif图),用可移植网络图形格式(Portable Network Graphic Format,PNG)图片等带透明度的图片进行叠加来实现视频合成即可;而对于多数视频类的贴纸信息文件(如视频),由于其是不带透明度的,因此当使用视频作为素材时,素材视频的分辨率是拍摄视频的两倍,其中一半的像素用来表示贴纸的RGB值,另一半像素用来表示贴纸的透明度。具体地,视频类贴纸信息的存储方式是:RGB和透明通道分开,将拍摄的视频,分为一半是素材RGB,一半是素材透明度进行存储,如图16所示。RGB这种色彩模式是一种颜色标准,是通过对红(R)、绿(G)、蓝(B)三个颜色通道的变化以及它们相互之间的叠加来得到各式各样的颜色的,RGB即是代表红、绿、蓝三个通道的颜色,这个标准几乎包括了人类视力所能感知的所有颜色。
在贴纸信息和拍摄的视频进行视频叠加的视频合成时,当贴纸信息上某个像素的透明度为a时,合成视频的RGB值=a*视频类贴纸信息的RGB值+(1-a)*拍摄视频的RGB值。
B)贴纸的描述文件,该描述文件包含的信息为:i)贴图出现的中心点位置;ii)贴图出现时间。从而,根据贴纸信息和贴纸描述文件,可以为终端主动推送贴纸,以在正在录制视频的合适的位置,合适的时间点,叠加上合适的动态贴纸,无需用户手动选择贴纸。其中,贴图出现时间包括:a)对于一次性播放的动态贴图,需要设定什么时候开始;b)对于重复播放的动态贴图,需要设定开始和结束时间。
C)五官类贴纸需要设定:五官类型的信息包括:i)头顶;ii)眼睛;iii)脸蛋;iv)嘴巴;v)鼻子。
D)触发类贴纸需要设定:触发条件,触发条件具体包括:i)张嘴;ii)眨眼;iii)微笑;iv)抖眉毛。
E)不同素材的层级关系。
三,素材预览时,解析动态贴纸的文件。
四,录制过程中,按照贴纸信息的描述文件,绘制动态贴纸,已达到实时可见的目的,如果是五官类和触发类贴纸,在系统构成中还包括了人脸检测算法。需要注意的是该人脸检测算法使用现有的人脸检测算法,算法本身不包括在专利内。绘制贴纸时根据人脸检测的结果将贴纸绘制在恰当的位置,如图17所示为整个系统的结构示意图。在一个实施例中,图17中的模块都位于终端侧。对于触发类贴纸和五官类贴纸信息,需要贴纸信息与特征坐标相结合,之后再与录制视频合成,即:需要与特征检测和素材解析器发生关系,再与视频合成器发生关系,这是因为表情,动作,五官变化这种,坐标都是会变化的。具体的,终端通过应用(如相机应用)拍摄原始视频,在拍摄过程中,终端通过特征检测器来检测原始视频中每一帧图像中的人脸区域或者取景框内用户动作的特征,分析出具体的特征参数及其对应的特征坐标。特征坐标包括初始坐标和形变后的目的坐标。终端收到基于服务器匹配特征参数后发送的贴纸信息和贴纸信息的描述文件后,通过素材解析器对贴纸信 息和贴纸信息的描述文件进行解析,以得到贴纸信息及其属性和叠加位置和叠加时间点等信息,将贴纸信息按照特征坐标,贴纸信息的描述文件指示的叠加位置和叠加时间点等信息,通过视频合成器将贴纸信息与正在拍摄的原始视频进行视频合成,生成含有贴纸信息的视频处理结果。而除去触发类贴纸和五官类贴纸信息之外,其他几种(普通贴纸和背景贴纸),都是贴纸信息直接与录制视频合成,即:仅与视频合成器发生关系,因为坐标通常不会变,具体的,终端通过应用(如相机应用)拍摄原始视频,在拍摄过程中,终端收到基于服务器匹配视频中文字信息后发送的贴纸信息和贴纸信息的描述文件后,通过素材解析器对贴纸信息和贴纸信息的描述文件进行解析,以得到贴纸信息及其属性和叠加位置和叠加时间点等信息,将贴纸信息按照贴纸信息的描述文件指示的叠加位置和叠加时间点等信息,通过视频合成器将贴纸信息与正在拍摄的原始视频进行视频合成,生成含有贴纸信息的视频处理结果。
五,最终在录制时将动态贴纸一起录制到视频中,已完成视频录制。
在本申请所提供的几个实施例中,应该理解到,所揭露的设备和方法,可以通过其它的方式实现。以上所描述的设备实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,如:多个单元或组件可以结合,或可以集成到另一个系统,或一些特征可以忽略,或不执行。另外,所显示或讨论的各组成部分相互之间的耦合、或直接耦合、或通信连接可以是通过一些接口,设备或单元的间接耦合或通信连接,可以是电性的、机械的或其它形式的。
上述作为分离部件说明的单元可以是、或也可以不是物理上分开的,作为单元显示的部件可以是、或也可以不是物理单元,即可以位于一个地方,也可以分布到多个网络单元上;可以根据实际的需要选择其中的部分或全部单元来实现本实施例方案的目的。
另外,在本申请各实施例中的各功能单元可以全部集成在一个处理单元中,也可以是各单元分别单独作为一个单元,也可以两个或两个以上单元集成在一个单元中;上述集成的单元既可以采用硬件的形式实现,也可以采用 硬件加软件功能单元的形式实现。
本领域普通技术人员可以理解:实现上述方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成,前述的程序可以存储于一计算机可读取存储介质中,该程序在执行时,执行包括上述方法实施例的步骤;而前述的存储介质包括:移动存储设备、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。
或者,本申请上述集成的单元如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请实施例的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机、服务器、或者网络设备等)执行本申请各个实施例所述方法的全部或部分。而前述的存储介质包括:移动存储设备、ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (17)

  1. 一种信息处理方法,其特征在于,包括:
    终端获取第一操作,以触发第一媒体信息的采集;
    终端在采集所述第一媒体信息的过程中检测到符合预设条件的人脸区域内的表情变化或采集框内的用户动作变化时,将检测到的表情变化或用户动作变化的变化量作为关键信息上报给服务器;
    终端接收所述服务器推送的与所述关键信息对应的第二媒体信息;以及
    将第一媒体信息和第二媒体信息进行视频合成。
  2. 根据权利要求1所述的方法,其特征在于,所述关键信息还包括:所述第一媒体信息中的文字信息;以及
    所述方法还包括:
    在采集所述第一媒体信息的过程中检测所述文字信息,并将检测的文字信息作为关键信息上报给所述服务器。
  3. 根据权利要求2所述的方法,其特征在于,将第一媒体信息和第二媒体信息进行视频合成之前,所述方法还包括:
    终端接收服务器推送的与所述关键信息对应的第二媒体信息的描述文件。
  4. 根据权利要求3所述的方法,其特征在于,所述描述文件包括:所述第二媒体信息相对于第一媒体信息的位置,以及第二媒体信息的显示时间。
  5. 根据权利要求4所述的方法,其特征在于,所述将第一媒体信息和第二媒体信息进行视频合成包括:
    根据所述描述文件将所述第一媒体信息与所述第二媒体信息进行视频合成,以在所述描述文件指定的显示时间内将所述第二媒体信息显示在所述描述文件指定的所述第一媒体信息的位置处。
  6. 根据权利要求2至5中任一项所述的方法,其特征在于,所述第二多媒体信息包括以下至少一类:
    由所述表情变化或所述用户动作变化触发显示的第一类贴纸信息;以及
    由所述文字信息触发显示的第二类贴纸信息。
  7. 根据权利要求6所述的方法,其特征在于,当所述第二多媒体信息为第一类贴纸信息时,所述将第一媒体信息和第二媒体信息进行视频合成包括:
    确定所述表情变化或所述用户动作变化的特征初始坐标和特征目标坐标,以根据所述特征目标坐标定位的位置点或者由所述特征初始坐标至所述特征目标坐标确定的位置区域来确定叠加所述第一类贴纸信息的位置;
    解析收到的所述第一类贴纸信息的描述文件,得到第一类贴纸信息的显示时间;
    按照所述确定的位置以及所述解析的第一类贴纸信息的显示时间,将第一类贴纸信息与第一媒体信息进行视频合成。
  8. 根据权利要求6所述的方法,其特征在于,当所述第二多媒体信息为第二类贴纸信息时,所述将第一媒体信息和第二媒体信息进行视频合成包括:
    解析收到的所述第二类贴纸信息的描述文件,得到第二类贴纸信息相对于第一媒体信息的位置,以及第二类贴纸信息的显示时间;以及
    按照所述得到的位置和所述显示时间,将第二类贴纸信息与第一媒体信息进行视频合成。
  9. 一种终端,包括:
    触发单元,用于获取第一操作,以触发第一媒体信息的采集;
    检测单元,用于采集所述第一媒体信息的过程中检测到符合预设条件的人脸区域内的表情变化或采集框内的用户动作变化时,将检测到的表情变化或用户动作变化的变化量作为关键信息上报给服务器;
    接收单元,用于接收所述服务器推送的与所述关键信息对应的第二媒体信息;以及
    合成单元,用于将第一媒体信息和第二媒体信息进行视频合成。
  10. 根据权利要求9所述的终端,其特征在于,所述关键信息还包括:所述第一媒体信息中的文字信息;以及
    所述检测单元,还用于在采集所述第一媒体信息的过程中检测所述文字信息,并将所述文字信息作为关键信息上报给所述服务器。
  11. 根据权利要求10所述的终端,其特征在于,所述接收单元,进一步用于:接收服务器推送的与所述关键信息对应的第二媒体信息的描述文件。
  12. 根据权利要求11所述的终端,其特征在于,所述描述文件包括:所述第二媒体信息相对于第一媒体信息的位置,以及第二媒体信息的显示时间。
  13. 根据权利要求12所述的终端,其特征在于,所述合成单元,进一步用于:
    根据所述描述文件将所述第一媒体信息与所述第二媒体信息进行视频合成,以在所述描述文件指定的显示时间内将所述第二媒体信息显示在所述描述文件指定的所述第一媒体信息的位置处。
  14. 根据权利要求10至13中任一项所述的终端,其特征在于,所述第二多媒体信息包括以下至少一类:
    由所述表情变化或所述用户动作变化触发显示的第一类贴纸信息;以及
    由所述文字信息触发显示的第二类贴纸信息。
  15. 根据权利要求14所述的终端,其特征在于,当所述第二多媒体信息为第一类贴纸信息时,所述合成单元,进一步用于:
    确定所述表情变化或所述用户动作变化的特征初始坐标和特征目标坐标,以根据所述特征目标坐标定位的位置点或者由所述特征初始坐标至所述特征目标坐标确定的位置区域来确定叠加所述第一类贴纸信息的位置;
    解析收到的所述第一类贴纸信息的描述文件,得到第一类贴纸信息的显示时间;以及
    按照所述确定的位置以及所述解析的第一类贴纸信息的显示时间,将第一类贴纸信息与第一媒体信息进行视频合成。
  16. 根据权利要求14所述的终端,其特征在于,当所述第二多媒体信息为第二类贴纸信息时,所述合成单元,进一步用于:
    解析收到的所述第二类贴纸信息的描述文件,得到第二类贴纸信息相对于第一媒体信息的位置,以及第二类贴纸信息的显示时间;以及
    按照所述得到的位置和所述显示时间,将第二类贴纸信息与第一媒体信息进行视频合成。
  17. 一种非易失性存储介质,存储有程序,当所述非易失性存储介质存储的程序被包括一个或多个处理器的计算机设备执行时,可使所述计算机设备执行如权利要求1至8中任一项所述信息处理方法。
PCT/CN2017/076576 2016-03-14 2017-03-14 一种信息处理方法及终端 WO2017157272A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
KR1020187026680A KR102135215B1 (ko) 2016-03-14 2017-03-14 정보 처리 방법 및 단말
JP2018527883A JP2019504532A (ja) 2016-03-14 2017-03-14 情報処理方法及び端末
US15/962,663 US11140436B2 (en) 2016-03-14 2018-04-25 Information processing method and terminal

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610143985.2A CN105791692B (zh) 2016-03-14 2016-03-14 一种信息处理方法、终端及存储介质
CN201610143985.2 2016-03-14

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US15/962,663 Continuation US11140436B2 (en) 2016-03-14 2018-04-25 Information processing method and terminal

Publications (1)

Publication Number Publication Date
WO2017157272A1 true WO2017157272A1 (zh) 2017-09-21

Family

ID=56392673

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/076576 WO2017157272A1 (zh) 2016-03-14 2017-03-14 一种信息处理方法及终端

Country Status (5)

Country Link
US (1) US11140436B2 (zh)
JP (1) JP2019504532A (zh)
KR (1) KR102135215B1 (zh)
CN (1) CN105791692B (zh)
WO (1) WO2017157272A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107995499A (zh) * 2017-12-04 2018-05-04 腾讯科技(深圳)有限公司 媒体数据的处理方法、装置及相关设备

Families Citing this family (43)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105791692B (zh) * 2016-03-14 2020-04-07 腾讯科技(深圳)有限公司 一种信息处理方法、终端及存储介质
CN106303293B (zh) * 2016-08-15 2019-07-30 Oppo广东移动通信有限公司 视频处理方法、装置及移动终端
CN107343220B (zh) 2016-08-19 2019-12-31 北京市商汤科技开发有限公司 数据处理方法、装置和终端设备
CN106210545A (zh) * 2016-08-22 2016-12-07 北京金山安全软件有限公司 一种视频拍摄方法、装置及电子设备
CN106373170A (zh) * 2016-08-31 2017-02-01 北京云图微动科技有限公司 一种视频制作方法及装置
US11049147B2 (en) * 2016-09-09 2021-06-29 Sony Corporation System and method for providing recommendation on an electronic device based on emotional state detection
CN106339201A (zh) * 2016-09-14 2017-01-18 北京金山安全软件有限公司 贴图处理方法、装置和电子设备
CN106341608A (zh) * 2016-10-28 2017-01-18 维沃移动通信有限公司 一种基于情绪的拍摄方法及移动终端
CN106683120B (zh) * 2016-12-28 2019-12-13 杭州趣维科技有限公司 追踪并覆盖动态贴纸的图像处理方法
JP6520975B2 (ja) * 2017-03-16 2019-05-29 カシオ計算機株式会社 動画像処理装置、動画像処理方法及びプログラム
US10515199B2 (en) * 2017-04-19 2019-12-24 Qualcomm Incorporated Systems and methods for facial authentication
CN107529029A (zh) * 2017-07-31 2017-12-29 深圳回收宝科技有限公司 一种在检测文件中添加标签的方法、设备以及存储介质
KR101968723B1 (ko) * 2017-10-18 2019-04-12 네이버 주식회사 카메라 이펙트를 제공하는 방법 및 시스템
CN108024071B (zh) * 2017-11-24 2022-03-08 腾讯数码(天津)有限公司 视频内容生成方法、视频内容生成装置及存储介质
US10410060B2 (en) * 2017-12-14 2019-09-10 Google Llc Generating synthesis videos
CN108388557A (zh) * 2018-02-06 2018-08-10 腾讯科技(深圳)有限公司 消息处理方法、装置、计算机设备和存储介质
CN108737715A (zh) * 2018-03-21 2018-11-02 北京猎户星空科技有限公司 一种拍照方法及装置
CN113658298A (zh) * 2018-05-02 2021-11-16 北京市商汤科技开发有限公司 特效程序文件包的生成及特效生成方法与装置
CN110163861A (zh) 2018-07-11 2019-08-23 腾讯科技(深圳)有限公司 图像处理方法、装置、存储介质和计算机设备
CN108958610A (zh) * 2018-07-27 2018-12-07 北京微播视界科技有限公司 基于人脸的特效生成方法、装置和电子设备
CN109213932B (zh) * 2018-08-09 2021-07-09 咪咕数字传媒有限公司 一种信息推送方法及装置
CN109388501B (zh) * 2018-08-31 2024-03-05 平安科技(深圳)有限公司 基于人脸识别请求的通信匹配方法、装置、设备及介质
CN109379623A (zh) * 2018-11-08 2019-02-22 北京微播视界科技有限公司 视频内容生成方法、装置、计算机设备和存储介质
CN109587397A (zh) * 2018-12-03 2019-04-05 深圳市优炫智科科技有限公司 基于人脸检测动态贴图的儿童相机及其动态贴图方法
CN109660855B (zh) * 2018-12-19 2021-11-02 北京达佳互联信息技术有限公司 贴纸显示方法、装置、终端及存储介质
CN111695376A (zh) * 2019-03-13 2020-09-22 阿里巴巴集团控股有限公司 视频处理方法、视频处理装置及电子设备
CN110139170B (zh) * 2019-04-08 2022-03-29 顺丰科技有限公司 视频贺卡生成方法、装置、系统、设备及存储介质
CN112019919B (zh) * 2019-05-31 2022-03-15 北京字节跳动网络技术有限公司 视频贴纸的添加方法、装置及电子设备
CN110784762B (zh) * 2019-08-21 2022-06-21 腾讯科技(深圳)有限公司 一种视频数据处理方法、装置、设备及存储介质
CN110782510A (zh) * 2019-10-25 2020-02-11 北京达佳互联信息技术有限公司 一种贴纸生成方法及装置
CN111177542B (zh) * 2019-12-20 2021-07-20 贝壳找房(北京)科技有限公司 介绍信息的生成方法和装置、电子设备和存储介质
US11675494B2 (en) * 2020-03-26 2023-06-13 Snap Inc. Combining first user interface content into second user interface
CN111556335A (zh) * 2020-04-15 2020-08-18 早安科技(广州)有限公司 一种视频贴纸处理方法及装置
KR20210135683A (ko) * 2020-05-06 2021-11-16 라인플러스 주식회사 인터넷 전화 기반 통화 중 리액션을 표시하는 방법, 시스템, 및 컴퓨터 프로그램
CN111597984B (zh) * 2020-05-15 2023-09-26 北京百度网讯科技有限公司 贴纸测试方法、装置、电子设备及计算机可读存储介质
CN113709573B (zh) 2020-05-21 2023-10-24 抖音视界有限公司 配置视频特效方法、装置、设备及存储介质
CN111627115A (zh) * 2020-05-26 2020-09-04 浙江商汤科技开发有限公司 互动合影方法及装置、互动装置以及计算机存储介质
CN111757175A (zh) * 2020-06-08 2020-10-09 维沃移动通信有限公司 视频处理方法及装置
CN111726701B (zh) * 2020-06-30 2022-03-04 腾讯科技(深圳)有限公司 信息植入方法、视频播放方法、装置和计算机设备
KR20230163528A (ko) * 2021-03-31 2023-11-30 스냅 인코포레이티드 맞춤화가능한 아바타 생성 시스템
US11941227B2 (en) 2021-06-30 2024-03-26 Snap Inc. Hybrid search system for customizable media
US11689765B1 (en) * 2021-12-31 2023-06-27 The Nielsen Company (Us), Llc Methods and apparatus for obfuscated audience identification
CN114513705A (zh) * 2022-02-21 2022-05-17 北京字节跳动网络技术有限公司 视频显示方法、装置和存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1870744A (zh) * 2005-05-25 2006-11-29 冲电气工业株式会社 图像合成装置、通信终端、图像通信系统以及聊天服务器
CN101453573A (zh) * 2007-12-04 2009-06-10 奥林巴斯映像株式会社 图像显示装置和照相机、图像显示方法、程序及图像显示系统
JP2010066844A (ja) * 2008-09-09 2010-03-25 Fujifilm Corp 動画コンテンツの加工方法及び装置、並びに動画コンテンツの加工プログラム
TW201021550A (en) * 2008-11-19 2010-06-01 Altek Corp Emotion-based image processing apparatus and image processing method
CN102427553A (zh) * 2011-09-23 2012-04-25 Tcl集团股份有限公司 一种电视节目播放方法、系统及电视机和服务器
CN105791692A (zh) * 2016-03-14 2016-07-20 腾讯科技(深圳)有限公司 一种信息处理方法及终端

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006211120A (ja) * 2005-01-26 2006-08-10 Sharp Corp 文字情報表示機能を備えた映像表示システム
JP4356645B2 (ja) * 2005-04-28 2009-11-04 ソニー株式会社 字幕生成装置及び方法
EP2194509A1 (en) * 2006-05-07 2010-06-09 Sony Computer Entertainment Inc. Method for providing affective characteristics to computer generated avatar during gameplay
US8243116B2 (en) * 2007-09-24 2012-08-14 Fuji Xerox Co., Ltd. Method and system for modifying non-verbal behavior for social appropriateness in video conferencing and other computer mediated communications
US20100257462A1 (en) * 2009-04-01 2010-10-07 Avaya Inc Interpretation of gestures to provide visual queues
JP2013046358A (ja) * 2011-08-26 2013-03-04 Nippon Hoso Kyokai <Nhk> コンテンツ再生装置及びコンテンツ再生プログラム
US20140111542A1 (en) * 2012-10-20 2014-04-24 James Yoong-Siang Wan Platform for recognising text using mobile devices with a built-in device video camera and automatically retrieving associated content based on the recognised text
US9251405B2 (en) * 2013-06-20 2016-02-02 Elwha Llc Systems and methods for enhancement of facial expressions
US20160196584A1 (en) 2015-01-06 2016-07-07 Facebook, Inc. Techniques for context sensitive overlays
US9697648B1 (en) * 2015-12-23 2017-07-04 Intel Corporation Text functions in augmented reality

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1870744A (zh) * 2005-05-25 2006-11-29 冲电气工业株式会社 图像合成装置、通信终端、图像通信系统以及聊天服务器
CN101453573A (zh) * 2007-12-04 2009-06-10 奥林巴斯映像株式会社 图像显示装置和照相机、图像显示方法、程序及图像显示系统
JP2010066844A (ja) * 2008-09-09 2010-03-25 Fujifilm Corp 動画コンテンツの加工方法及び装置、並びに動画コンテンツの加工プログラム
TW201021550A (en) * 2008-11-19 2010-06-01 Altek Corp Emotion-based image processing apparatus and image processing method
CN102427553A (zh) * 2011-09-23 2012-04-25 Tcl集团股份有限公司 一种电视节目播放方法、系统及电视机和服务器
CN105791692A (zh) * 2016-03-14 2016-07-20 腾讯科技(深圳)有限公司 一种信息处理方法及终端

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107995499A (zh) * 2017-12-04 2018-05-04 腾讯科技(深圳)有限公司 媒体数据的处理方法、装置及相关设备
CN107995499B (zh) * 2017-12-04 2021-07-23 腾讯科技(深圳)有限公司 媒体数据的处理方法、装置及相关设备

Also Published As

Publication number Publication date
CN105791692A (zh) 2016-07-20
CN105791692B (zh) 2020-04-07
KR102135215B1 (ko) 2020-07-17
KR20180112848A (ko) 2018-10-12
US20180249200A1 (en) 2018-08-30
JP2019504532A (ja) 2019-02-14
US11140436B2 (en) 2021-10-05

Similar Documents

Publication Publication Date Title
WO2017157272A1 (zh) 一种信息处理方法及终端
US11321385B2 (en) Visualization of image themes based on image content
CN110612533B (zh) 用于根据表情对图像进行识别、排序和呈现的方法
TWI253860B (en) Method for generating a slide show of an image
WO2020063319A1 (zh) 动态表情生成方法、计算机可读存储介质和计算机设备
CN105190480B (zh) 信息处理设备和信息处理方法
US8416332B2 (en) Information processing apparatus, information processing method, and program
EP3195601B1 (en) Method of providing visual sound image and electronic device implementing the same
US20220174237A1 (en) Video special effect generation method and terminal
WO2022095757A1 (zh) 图像渲染方法和装置
US11394888B2 (en) Personalized videos
US20180352191A1 (en) Dynamic aspect media presentations
WO2018177134A1 (zh) 用户生成内容处理方法、存储介质和终端
CN115529378A (zh) 一种视频处理方法及相关装置
US20170061642A1 (en) Information processing apparatus, information processing method, and non-transitory computer readable medium
JP2010251841A (ja) 画像抽出プログラムおよび画像抽出装置
US20230043683A1 (en) Determining a change in position of displayed digital content in subsequent frames via graphics processing circuitry
JP6166070B2 (ja) 再生装置および再生方法
CN115225756A (zh) 确定目标对象的方法、拍摄方法和装置
JP2014110469A (ja) 電子機器、画像処理方法、及びプログラム
JP2017211995A (ja) 再生装置、再生方法、再生プログラム、音声要約装置、音声要約方法および音声要約プログラム
US20230326095A1 (en) Overlaying displayed digital content with regional transparency and regional lossless compression transmitted over a communication network via processing circuitry
US20230326094A1 (en) Integrating overlaid content into displayed data via graphics processing circuitry and processing circuitry using a computing memory and an operating system memory
CN113873135A (zh) 一种图像获得方法、装置、电子设备及存储介质
CN117688196A (zh) 图像推荐方法、相关装置

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 2018527883

Country of ref document: JP

ENP Entry into the national phase

Ref document number: 20187026680

Country of ref document: KR

Kind code of ref document: A

WWE Wipo information: entry into national phase

Ref document number: 1020187026680

Country of ref document: KR

NENP Non-entry into the national phase

Ref country code: DE

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17765814

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 17765814

Country of ref document: EP

Kind code of ref document: A1