WO2017157272A1 - 一种信息处理方法及终端 - Google Patents
一种信息处理方法及终端 Download PDFInfo
- Publication number
- WO2017157272A1 WO2017157272A1 PCT/CN2017/076576 CN2017076576W WO2017157272A1 WO 2017157272 A1 WO2017157272 A1 WO 2017157272A1 CN 2017076576 W CN2017076576 W CN 2017076576W WO 2017157272 A1 WO2017157272 A1 WO 2017157272A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- information
- sticker
- media information
- video
- type
- Prior art date
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/41—Structure of client; Structure of client peripherals
- H04N21/4104—Peripherals receiving signals from specially adapted client devices
- H04N21/4126—The peripheral being portable, e.g. PDAs or mobile phones
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T11/00—2D [Two Dimensional] image generation
- G06T11/60—Editing figures and text; Combining figures or text
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/433—Content storage operation, e.g. storage operation in response to a pause request, caching operations
- H04N21/4334—Recording operations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/174—Facial expression recognition
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/41—Structure of client; Structure of client peripherals
- H04N21/4104—Peripherals receiving signals from specially adapted client devices
- H04N21/4122—Peripherals receiving signals from specially adapted client devices additional display device, e.g. video projector
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/41—Structure of client; Structure of client peripherals
- H04N21/414—Specialised client platforms, e.g. receiver in car or embedded in a mobile appliance
- H04N21/41407—Specialised client platforms, e.g. receiver in car or embedded in a mobile appliance embedded in a portable device, e.g. video client on a mobile phone, PDA, laptop
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/41—Structure of client; Structure of client peripherals
- H04N21/422—Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
- H04N21/4223—Cameras
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
- H04N21/44008—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/47—End-user applications
- H04N21/478—Supplemental services, e.g. displaying phone caller identification, shopping application
- H04N21/4788—Supplemental services, e.g. displaying phone caller identification, shopping application communicating with other users, e.g. chatting
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/60—Control of cameras or camera modules
- H04N23/61—Control of cameras or camera modules based on recognised objects
- H04N23/611—Control of cameras or camera modules based on recognised objects where the recognised objects include parts of the human body
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/60—Control of cameras or camera modules
- H04N23/62—Control of parameters via user interfaces
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/80—Camera processing pipelines; Components thereof
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N5/00—Details of television systems
- H04N5/76—Television signal recording
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
- H04N21/44016—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving splicing one content stream with another content stream, e.g. for substituting a video clip
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/442—Monitoring of processes or resources, e.g. detecting the failure of a recording device, monitoring the downstream bandwidth, the number of times a movie has been viewed, the storage space available from the internal hard disk
- H04N21/44213—Monitoring of end-user related data
- H04N21/44218—Detecting physical presence or behaviour of the user, e.g. using sensors to detect if the user is leaving the room or changes his face expression during a TV program
Definitions
- the present application relates to communication technologies, and in particular, to an information processing method and a terminal.
- the recorded video can also be attached with other information related or unrelated to the video content to get the synthesized video.
- the operation of attaching other information is very complicated and cumbersome.
- the user needs to go to the material library to select the information.
- the user needs to go to the material library to select information related to a certain video information in the video content.
- the interaction mode is very complicated and requires multiple interactions, which will inevitably lead to
- the processing efficiency is low, and the interaction time is back and forth, and the processing time cost is also high.
- the effect of the synthesized video finally achieved may be unsatisfactory, and does not meet the real user requirements, and the user may re-synthesize once. Then, the information processing cost of video synthesis using the terminal will continue to increase.
- the related art there is no effective solution to this problem.
- the embodiment of the present application provides an information processing method and a terminal, which solve at least the problems existing in the prior art.
- an information processing method comprising:
- the terminal acquires a first operation to trigger collection of the first media information
- the terminal detects a face area that meets a preset condition during the process of collecting the first media information.
- the detected expression change or the change amount of the user action change is reported to the server as key information;
- the first media information and the second media information are video synthesized.
- a terminal comprising:
- a triggering unit configured to acquire a first operation to trigger collection of the first media information
- the detecting unit when detecting the change of the expression in the face area that meets the preset condition or the change of the user action in the collection frame during the process of collecting the first media information, the detected expression change or the user action changes The amount of change is reported to the server as key information;
- a receiving unit configured to receive second media information that is pushed by the server and that corresponds to the key information
- a synthesizing unit configured to perform video synthesis on the first media information and the second media information.
- a nonvolatile storage medium storing a program, when a program stored in the nonvolatile storage medium is executed by a computer device including one or more processors, The computer device is caused to perform the information processing method as described above.
- the information processing method in the embodiment of the present application includes: the terminal acquires the first operation to trigger the collection of the first media information; and the terminal detects the expression in the face region that meets the preset condition in the process of collecting the first media information.
- the change or the user action in the collection box changes, the obtained change amount is reported to the server as key information; the terminal receives the second media information corresponding to the key information pushed by the server; and the first media information and the second media information Video synthesis is performed according to the preset configuration.
- the corresponding second media information is obtained from the server based on the change amount, and the first media information and the second media information are followed according to the embodiment.
- the preset configuration performs video synthesis to replay the synthesized video after the first media information is collected.
- corresponding second media information is displayed at a specified position and a specified time of the first media information. Because the second media information does not need to be manually selected and added by the user, the operation flow is simplified, the processing efficiency is improved, and the detection result (such as the expression change or the user action change) obtained in the process of collecting the first media information is requested.
- the second media information Corresponding to the second media information is also more in line with real user needs.
- the position and time of the second media information can also be matched with the detection results such as expression changes or user movement changes, so the position and time point are also accurate. Not only does it reduce multiple interactions, but it does not require subsequent re-adjustment and re-synthesis, which reduces the information processing cost and time cost of video synthesis.
- 1 is a schematic diagram of hardware entities of each party performing information interaction in an embodiment of the present application
- FIG. 2 is a schematic diagram of an implementation process of Embodiment 1 of the present application.
- FIG. 3 is a schematic diagram of an application scenario applied to an embodiment of the present application.
- FIG. 4 is a schematic diagram of trigger video recording using an embodiment of the present application.
- FIG. 13 is a schematic flowchart of an implementation process of Embodiment 2 of the present application.
- FIG. 14 is a schematic structural diagram of a third embodiment of the present application.
- FIG. 15 is a schematic structural diagram of a hardware component of Embodiment 4 of the present application.
- 16 is a schematic diagram of a scenario in which RGB and transparency are separately stored in an embodiment of the present application.
- FIG. 17 is a system architecture diagram of an example to which an embodiment of the present application is applied.
- FIG. 1 is a schematic diagram of hardware entities of each party performing information interaction in the embodiment of the present application, and FIG. 1 includes: a server 11 and terminal devices 21, 22, 23, and 24, wherein the terminal devices 21, 22, 23, and 24 pass through a wired network. Or the wireless network interacts with the server for information.
- the terminal device may include a mobile phone, a desktop computer, a PC, an all-in-one, and the like. Among them, the terminal device is installed with various applications that meet the daily and work needs of the user. If the user likes to take pictures and record video, applications such as image processing applications, video processing applications, etc. are installed in the terminal device; social applications are also installed for social sharing needs.
- the processing results obtained by using the image processing application and the video processing application can also be shared by the social application.
- the terminal device periodically obtains update data packets of each application from the server and saves them locally, when needed.
- the application on the terminal device starts an application (such as a video processing application), and acquires a first operation, such as an operation of turning on video recording, thereby triggering collection of the first media information such as a video.
- the terminal device detects the change of the expression in the face region that meets the preset condition or changes the user action in the collection frame during the process of collecting the first media information, the terminal device reports the obtained change amount as key information to the server.
- the expression change in the face area may be a smile, and the user action change may be blinking or scratching the scissors.
- the terminal receives second media information, such as a sticker, corresponding to the key information that is pushed by the server; and performs video synthesis on the first media information and the second media information.
- second media information such as a sticker
- the corresponding second media information is obtained from the server based on the change amount, and the first media information and the second media information are performed.
- the video is synthesized so that the synthesized video is replayed after the first media information is collected.
- corresponding second media information is displayed at a specified position and a specified time of the first media information.
- the operation flow is simplified, the processing efficiency is improved, and the detection result (such as the expression change or the user action change) obtained in the process of collecting the first media information is requested.
- the detection result (such as the expression change or the user action change) obtained in the process of collecting the first media information is requested.
- Corresponding to the second media information is also more in line with real user needs.
- FIG. 1 is only an example of a system architecture that implements the embodiments of the present application.
- the embodiment of the present application is not limited to the system structure described in FIG. 1 above, and various embodiments of the present application are proposed based on the system architecture.
- the information processing method of the embodiment of the present application is as shown in FIG. 2, and the method includes:
- Step 101 The terminal acquires a first operation to trigger collection of the first media information.
- the user is lying on the sofa using a terminal device such as the mobile phone 11.
- the user interface of the mobile phone 11 is as shown in FIG. 4, and includes various types of application icons, such as a music play icon, a function setting icon, a mail sending and receiving icon, and the like.
- the user performs the first operation, such as clicking on the video processing application icon identified by A1 with a finger, and entering the process of video recording, thereby touching
- the collection of the first media information such as a video. For example, you can record a scene in a room, or take a self-portrait for yourself.
- Step 102 When the terminal detects the change of the expression in the face area that meets the preset condition or the user action change in the collection frame in the process of collecting the first media information, the terminal reports the obtained change amount as key information to the server. .
- the terminal can capture expression changes in the face region, for example, smiling, crying, frowning, etc. .
- the terminal device can also detect changes in user movements in the collection frame (or the frame), such as a scissors hand. This detection is not limited to the face area. It is also possible to combine the expression changes in the face area with the changes in the user's movements, for example, combining the scissors hands and the smiles in the facial expressions for combined recognition.
- the face recognition technology is based on the facial features of a person, and collects a face image or a video stream in a video recording, first determining whether there is a face in the video stream, and if there is a face, Further, the position and size of the face are given, and the position information of each main facial organ is located, and the respective positions and initial forms of the facial features in the face are obtained.
- the form changes such as the smile
- the position of the upper and lower lips is generated relative to the initial form.
- Displacement and deformation indicate that facial expressions of facial features change, and expression changes can also be used to recognize changes in expression.
- the face recognition in the embodiment of the present application is different from the conventional face recognition.
- the conventional face recognition is to identify the user's identity through the constructed face recognition system, and the recognized face and the known face are performed. Compare to facilitate identity verification and identity lookup.
- expression recognition In the process of expression recognition, it can be divided into four stages: acquisition and preprocessing of face images; face detection; expression feature extraction; and expression classification. If only through the face recognition and positioning mechanism, there will be inaccuracies, and the expression recognition mechanism is a more accurate processing strategy.
- Expression recognition is closely related to face recognition. For example, the positioning in face detection and face tracking are similar, but the feature extraction is different.
- the features extracted by face recognition mainly focus on individual differences and characteristics of different faces, while facial expressions exist as interference signals, so face recognition does not pay much attention to facial expressions.
- the embodiment of the present application needs to pay attention to the change of the expression to trigger the corresponding second media information, so that individual differences can be ignored, and attention is paid to extracting faces in different expression modes.
- Feature extraction is the core step in facial expression recognition, which determines the final recognition result and affects the recognition rate.
- the feature extraction can be divided into: static image feature extraction and moving image feature extraction.
- static image feature extraction the deformation features of the expression (or the transient features of the expression) are extracted.
- motion image feature extraction for the moving image, not only the table situation change characteristics of each frame are extracted, but also To extract the motion characteristics of a continuous sequence.
- Deformation feature extraction can rely on neutral expressions or models to compare the generated expressions with neutral expressions to extract deformation features, while the extraction of motion features is directly dependent on the facial changes produced by the expressions.
- Step 103 The terminal receives second media information corresponding to the key information that is pushed by the server.
- a specific implementation of the step may be: after the step 102 reports the key information to the server, the server matches the corresponding second media information, such as sticker information, from the material library according to the key information, and pushes the second media information to the terminal, so that Subsequent to step 104, video synthesis is performed with the first media information.
- the user does not need to manually select the sticker information, but automatically pushes the terminal information to the terminal according to the matching of the key information, and automatically synthesizes (such as superimposing the video and sticker information) the video processing result in the process of collecting the first media information (such as video).
- the sticker information is displayed at a specified location and a specified time of the first media information (such as a video).
- Step 104 Perform video synthesis on the first media information and the second media information.
- the key information further includes: text information in the first media information.
- the information processing method further includes: detecting the text information in the process of collecting the first media information, and reporting the information to the server as key information.
- the text information in FIG. 5 specifically, the text information "A red fire” identified by A2 is included in the video information. After the video information is recorded, add the sticker information “Red Fire” as indicated by A2’.
- the sticker information is the interaction with the server through the terminal multiple times, from the server material The library is manually selected, and then the sticker information is attached to the video information that has been recorded.
- FIG. 6 shows another application scenario of the prior art. Specifically, the text information “boyfriend” identified by A3 is included in the video information. After the video information recording is completed, the sticker information "boyfriend” as identified by A3' is added.
- the sticker information is manually selected from the server material library through the terminal interaction with the server multiple times, and then the sticker information is attached to the video information that has been recorded. This kind of processing is very cumbersome and requires multiple user interactions.
- the subsequent search for the sticker is not necessarily what the user really needs. Even if the user really needs it, the user needs to manually add the video information that has already been recorded manually, for example, The sticker moves to the appropriate location of the video information and so on.
- the video shown in FIG. 7 includes the text information “eat not fat” identified by A4, which is sent to the server as key information.
- the matching sticker information obtained based on the key information is identified as A4'.
- the video shown in FIG. 8 includes the text information "boyfriend” identified by A5, which is transmitted as a key message to the server.
- the matching sticker information obtained based on the key information is identified as A5'.
- B1 is used to identify the control button during video recording
- B2 is used to identify the playback button after the video recording is over.
- FIG. 9 is a schematic diagram of playing back the video after synthesizing the sticker information and the video at a suitable position and time point during recording of the video.
- FIG. 9 is a schematic diagram of playing back the video after synthesizing the sticker information and the video at a suitable position and time point during recording of the video.
- FIG. 10 is a schematic diagram of playing back a video after synthesizing the sticker information and the video at a suitable position and time point after recording the video in the embodiment of the present application, wherein when the video is played in the corresponding recorded video information, At the time of the year-end award, the text information of the corresponding voice can be displayed on the video interface.
- the composite sticker information is also displayed on the video interface, and the dynamic sticker effect displays “a lot of year-end awards” and is matched with the currency unit.
- An indicator such as ⁇ , combines it with the text "A lot of year-end awards.”
- sticker shapes can be obtained by recognizing the facial expression or the user's motion.
- the user action and the voice can be combined.
- the user action can be used The happy eyes of the family.
- other sticker information such as A6 may also be displayed on the video interface during the "happy blink” period. 'The eyes of the logo become two ⁇ ".
- the user action can also be a snap. The user action triggers the display of "eyes become two ⁇ " as indicated by A6' in Fig. 11 or displays the sticker information "a lot of year-end prizes" as shown in Fig. 10.
- FIG. 12 shows another application example using the embodiment of the present application.
- other sticker shapes can also be obtained by recognizing facial expressions.
- the voice played in the corresponding recorded video information is "I am so beautiful" as the A7 logo
- the position of the cheek of the face is recognized, and the position of the cheek of the face is superimposed as shown by A7'.
- Sticker information is a red-faced egg, a blush, or a blush in a five-official sticker type.
- the video interface when there is a "I am so beautiful?", there is also a synthetic sticker information on the video interface, and the person's face is flushed.
- the information processing method of the embodiment of the present application is as shown in FIG. 13 , and the method includes:
- Step 201 The terminal starts the application, acquires the first operation, and triggers collection of the first media information.
- the user is lying on the sofa using a terminal device such as the mobile phone 11.
- the user interface of the mobile phone 11 is as shown in FIG. 4, and includes various types of application icons, such as a music play icon, a function setting icon, a mail sending and receiving icon, and the like.
- the user performs the first operation, such as clicking the video processing application icon identified by the A1 with a finger to enter the process of video recording, thereby triggering the collection of the first media information, such as a video. For example, you can record a scene in a room, or take a self-portrait for yourself.
- Step 202 When the terminal detects the change of the expression in the face area that meets the preset condition or the user action change in the collection frame in the process of collecting the first media information, the terminal reports the obtained change amount as key information to the server. .
- the terminal device can capture expression changes in the face region, such as smiling, crying, frowning, etc. .
- the terminal device can also detect the acquisition frame (or User action changes within the framing frame, for example, than scissors. This detection is not limited to the face area. It is also possible to combine the expression changes in the face area with the changes in the user's movements, for example, combining the scissors hands and the smiles in the facial expressions for combined recognition.
- the face recognition technology is based on the facial features of a person, and collects a face image or a video stream in a video recording, first determining whether there is a face in the video stream, and if there is a face, Further, the position and size of the face are given, and the position information of each main facial organ is located, and the respective positions and initial forms of the facial features in the face are obtained.
- the form changes such as the smile
- the position of the upper and lower lips is generated relative to the initial form.
- Displacement and deformation indicate that facial expressions of facial features change, and expression changes can also be used to recognize changes in expression.
- the face recognition in the embodiment of the present application is different from the conventional face recognition.
- the conventional face recognition is to identify the user's identity through the constructed face recognition system, and the recognized face and the known face are performed. Compare to facilitate identity verification and identity lookup.
- expression recognition In the process of expression recognition, it can be divided into four stages: acquisition and preprocessing of face images; face detection; expression feature extraction; and expression classification. If only through the face recognition and positioning mechanism, there will be inaccuracies, and the expression recognition mechanism is a more accurate processing strategy.
- Expression recognition is closely related to face recognition, such as positioning and face in face detection. Tracking these links is similar, but the feature extraction is different.
- the features extracted by face recognition mainly focus on individual differences and characteristics of different faces, while facial expressions exist as interference signals. That is to say, but pay more attention to facial expressions.
- the embodiment of the present application needs to pay attention to the change of the expression to trigger the corresponding second media information.
- Feature extraction is the core step in facial expression recognition, which determines the final recognition result and affects the recognition rate.
- the feature extraction can be divided into: static image feature extraction and moving image feature extraction.
- static image feature extraction the deformation features of the expression (or the transient features of the expression) are extracted.
- motion image feature extraction for the moving image, not only the table situation change characteristics of each frame are extracted, but also To extract the motion characteristics of a continuous sequence.
- Deformation feature extraction can rely on neutral expressions or models to compare the generated expressions with neutral expressions to extract deformation features, while motion features
- the extraction is directly dependent on the facial changes produced by the expression.
- Step 203 The server selects, from the material library, a description file of the second media information and the second media information corresponding to the key information.
- Step 204 The terminal receives a description file of the second media information and the second media information corresponding to the key information that is pushed by the server.
- a specific implementation of the step may be: after the key information is reported to the server in step 202, the server matches the corresponding second media information, such as sticker information, from the material library according to the key information, and pushes the second media information to the server.
- the terminal is subsequently subjected to video synthesis with the first media information in step 205.
- the user does not need to manually select the sticker information, but automatically pushes the terminal information to the terminal according to the matching of the key information, and automatically synthesizes (such as superimposing the video and sticker information) the video processing result in the process of collecting the first media information (such as video).
- the sticker information is displayed at a specified location and a specified time of the first media information (such as a video).
- the description file and the second media information of the second media information corresponding to the key information may be simultaneously sent or separately sent, depending on the current network condition, if the network condition is good, the same is sent, if the network The situation is not good, in order to avoid the network is not good, lost data, can be issued separately.
- Step 205 Perform video synthesis on the first media information and the second media information.
- the key information further includes: text information in the first media information.
- the method further includes: detecting the text information in the process of collecting the first media information, and reporting the information to the server as key information.
- the text information in FIG. 5 specifically, the text information "A red fire” identified by A2 is included in the video information. After the video information is recorded, add the sticker information “Red Fire” as indicated by A2’.
- the sticker information is manually selected from the server material library through the terminal interaction with the server multiple times, and then the sticker information is attached to the video information that has been recorded.
- FIG. 6 is another application scenario of the prior art.
- the video information includes a text message “boyfriend” identified by A3.
- the sticker information "boyfriend" as identified by A3' is added.
- the sticker information is manually selected from the server material library through the terminal interaction with the server multiple times, and then the sticker information is attached to the video information that has been recorded. This kind of processing is very cumbersome and requires multiple user interactions.
- the subsequent search for the sticker is not necessarily what the user really needs. Even if the user really needs it, the user needs to manually add the video information that has already been recorded manually, for example, The sticker moves to the appropriate location of the video information and so on.
- the video shown in FIG. 7 includes the text information “eat not fat” identified by A4, which is sent to the server as key information.
- the matching sticker information obtained based on the key information is identified as A4'.
- the video shown in FIG. 8 includes the text information "Eat not fat” identified by A5, which is sent to the server as key information.
- the matching sticker information obtained based on the key information is identified as A5'.
- B1 is used to identify the control button during video recording
- B2 is used to identify the playback button after the video recording is over.
- FIG. 9 is a schematic diagram of playing back the video after synthesizing the sticker information and the video at a suitable position and time point during recording of the video.
- FIG. 10 is a schematic diagram of playing back a video after synthesizing the sticker information and the video at a suitable position and time point after recording the video in the embodiment of the present application, wherein when the video is played in the corresponding recorded video information, At the time of the year-end award, the text information of the corresponding voice can be displayed on the video interface.
- the composite sticker information is also displayed on the video interface, and the dynamic sticker effect displays “a lot of year-end awards” and is matched with the currency unit.
- An indicator such as ⁇ , combines it with the text "A lot of year-end awards.”
- sticker shapes can be obtained by recognizing the facial expression or the user's motion.
- the user action and the voice can be combined.
- the user action can be a happy blink of the user.
- the video interface can be displayed in addition to the display shown in FIG.
- the user action may be a snap finger.
- FIG. 12 shows another application example using the embodiment of the present application.
- other sticker shapes can also be obtained by recognizing facial expressions.
- the voice played in the corresponding recorded video information is "I am so beautiful" as the A7 logo
- the position of the cheek of the face is recognized, and the position of the cheek of the face is superimposed as shown by A7'.
- Sticker information is a red-faced egg, a blush, or a blush in a five-official sticker type.
- the video interface when there is a "I am so beautiful?", there is also a synthetic sticker information on the video interface, and the person's face is flushed.
- the video synthesis of the first media information and the second media information includes:
- a first implementation solution in response to the expression change or the user action change, acquiring a corresponding feature detection result, and configuring the second media information according to the feature detection result and the description file of the second media information Performing video synthesis with the first media information, and displaying the second media information at a location specified by the first media information at a specified time point or time period.
- the second media information in response to the text information, is video-combined with the first media information according to the configuration of the description file of the second media information, and is to be performed at a specified time point or time period.
- the second media information is displayed at a location specified by the first media information.
- the difference between the two schemes is that: in the first scheme, it is necessary to obtain feature coordinates (partial information or all information in the feature detection result), so as to combine the feature coordinates to determine which suitable designated position in the video information is to be put on the sticker information,
- the second media information can determine the time point.
- the placement of the sticker information has a fixed position and a fixed time requirement. According to the specified position and time point, the sticker information can be superimposed on the video information at a suitable time point. For example, as shown in FIG.
- the second multimedia information includes at least one of the following categories: 1) a first type of sticker information triggered by the expression change or the user action change, such as a facial feature sticker and Triggering a class sticker; 2) a second type of sticker information that is displayed by excluding the expression change or the user action change, and a background sticker.
- the responding to the expression change or the user action changes, acquiring a corresponding feature detection result, and using the second media information according to the feature detection result and the second
- the configuration of the description file of the media information and the video synthesis of the first media information include:
- A2 detecting a change in the feature coordinate caused by the expression change or the change of the user motion, and positioning the initial coordinate to the target coordinate to determine the superposition according to the position point obtained by the target coordinate positioning or the position area defined by the initial coordinate to the target coordinate.
- A4. Perform video synthesis on the second media information and the first media information according to the determined location and the display time of the parsed first type of sticker information.
- the second media information when the text information is responsive to the text information, the second media information is video-combined with the first media information according to the configuration of the description file of the second media information, including:
- the terminal includes: a triggering unit 21, configured to acquire a first operation to trigger collection of the first media information, and a detecting unit 22, configured to detect the person in the process of collecting the first media information.
- a triggering unit 21 configured to acquire a first operation to trigger collection of the first media information
- a detecting unit 22 configured to detect the person in the process of collecting the first media information.
- the obtained change amount is reported to the server as key information
- the receiving unit 23 is configured to receive the second media information corresponding to the key information that is pushed by the server.
- a synthesizing unit 24 configured to perform video synthesis on the first media information and the second media information.
- the user is lying on the sofa using a terminal device such as the mobile phone 11.
- the user interface of the mobile phone 11 is as shown in FIG. 4, and includes various types of application icons, such as a music play icon, a function setting icon, a mail sending and receiving icon, and the like.
- the user performs the first operation, such as clicking the video processing application icon identified by the A1 with a finger to enter the process of video recording, thereby triggering the collection of the first media information, such as a video. For example, you can record a scene in a room, or take a self-portrait for yourself.
- the terminal can capture expression changes in the face area, such as smiling, crying, frowning, and the like.
- the terminal device can also detect changes in user movements in the collection frame (or the frame), such as a scissors hand. It is also possible to combine the expression changes in the face area with the changes in the user's movements, for example, combining the scissors hands and the smiles in the facial expressions for combined recognition.
- the face recognition technology is based on the facial features of a person, and collects a face image or a video stream in a video recording, first determining whether there is a face in the video stream, and if there is a face, Further, the position and size of the face are given, and the position information of each main facial organ is located, and the respective positions and initial forms of the facial features in the face are obtained.
- the form changes such as the smile
- the position of the upper and lower lips is generated relative to the initial form.
- Displacement and deformation indicate that facial expressions of facial features change, and expression changes can also be used to recognize changes in expression.
- the face recognition of the embodiment of the present application is different from the conventional face recognition, and the conventional face recognition is for constructing
- the face recognition system recognizes the identity of the user by comparing the recognized face with the known face for identity confirmation and identity lookup.
- expression recognition In the process of expression recognition, it can be divided into four stages: acquisition and preprocessing of face images; face detection; expression feature extraction; and expression classification. If only through the face recognition and positioning mechanism, there will be inaccuracies, and the expression recognition mechanism is a more accurate processing strategy.
- Expression recognition is closely related to face recognition. For example, the positioning in face detection and face tracking are similar, but the feature extraction is different. For example, the features extracted by face recognition mainly focus on individual differences and characteristics of different faces, while facial expressions exist as interference signals, so face recognition does not pay much attention to facial expressions.
- the embodiment of the present application needs to pay attention to the change of the expression to trigger the corresponding second media information, so that the individual difference can be ignored, and the feature extraction of the difference feature of the face in different expression modes can be extracted. It can be combined with individual differences, or individual differences can be treated as interference signals in order to improve the accuracy of expression recognition, that is, not much attention is paid to individual differences.
- Feature extraction is the core step in facial expression recognition, which determines the final recognition result and affects the recognition rate.
- the feature extraction can be divided into: static image feature extraction and moving image feature extraction. In terms of static image feature extraction, the deformation features of the expression (or the transient features of the expression) are extracted.
- Deformation feature extraction for the moving image, not only the table situation change characteristics of each frame are extracted, but also To extract the motion characteristics of a continuous sequence.
- Deformation feature extraction can rely on neutral expressions or models to compare the generated expressions with neutral expressions to extract deformation features, while the extraction of motion features is directly dependent on the facial changes produced by the expressions.
- There are many ways to divide expressions 1) If you divide according to basic expressions, such as happiness, sadness, surprise, fear, anger and disgust, create different facial expression image libraries for subsequent matching and recognition. 2) Sort by emotion, such as happy, unpleasant, excited, calm, nervous, relaxed and so on.
- the key information further includes: text information in the first media information.
- the detecting unit 22 is further configured to detect the text information in the process of collecting the first media information, and report the text information to the server as key information.
- the text information in FIG. 5 specifically, the text information "A red fire” identified by A2 is included in the video information. Add a sticker letter as identified by A2’ after the video information is recorded. Interest is "red fire”.
- the sticker information is manually selected from the server material library through the terminal interaction with the server multiple times, and then the sticker information is attached to the video information that has been recorded.
- FIG. 6 shows another application scenario of the prior art. Specifically, the text information “boyfriend” identified by A3 is included in the video information. After the video information recording is completed, the sticker information "boyfriend” as identified by A3' is added.
- the sticker information is manually selected from the server material library through the terminal interaction with the server multiple times, and then the sticker information is attached to the video information that has been recorded. This kind of processing is very cumbersome and requires multiple user interactions.
- the subsequent search for the sticker is not necessarily what the user really needs. Even if the user really needs it, the user needs to manually add the video information that has already been recorded manually, for example, The sticker moves to the appropriate location of the video information and so on.
- the video shown in FIG. 7 includes the text information “eat not fat” identified by A4, which is sent to the server as key information.
- the matching sticker information obtained based on the key information is identified as A4'.
- the video shown in FIG. 8 includes the text information "boyfriend” identified by A5, which is sent to the server as key information.
- the matching sticker information obtained based on the key information is identified as A5'.
- B1 is used to identify the control button during video recording
- B2 is used to identify the playback button after the video recording is over.
- a video is synthesized by playing back the sticker information and the video at a suitable position and time point during video recording.
- FIG. 9 a video is synthesized by playing back the sticker information and the video at a suitable position and time point during video recording.
- FIG. 10 is a schematic diagram of playing back a video after synthesizing the sticker information and the video at a suitable position and time point after recording the video in the embodiment of the present application, wherein when the video is played in the corresponding recorded video information, At the time of the year-end award, the text information of the corresponding voice can be displayed on the video interface.
- the composite sticker information is also displayed on the video interface, and the dynamic sticker effect displays “a lot of year-end awards” and is matched with the currency unit.
- An indicator such as ⁇ , combines it with the text "A lot of year-end awards.”
- sticker shapes can be obtained by recognizing the facial expression or the user's motion.
- the voice played in the recorded video is "year-end" as identified by the A6.
- the user can combine the action and the voice.
- the user action can be a happy blink of the user.
- the video interface can also In this "happy blink” time period, other sticker information is also displayed on the video interface, such as the "eyes become two ⁇ " in the A6' logo. In addition to blinking, this can also be user action.
- the user action triggers the display of "the eyes become two ⁇ " as indicated by A6' in Fig. 11 or the sticker information "a lot of year-end prizes" as shown in FIG.
- FIG. 12 shows another application example using the embodiment of the present application.
- other sticker shapes can also be obtained by recognizing facial expressions.
- the voice played in the corresponding recorded video information is "I am so beautiful" as the A7 logo
- the position of the cheek of the face is recognized, and the position of the cheek of the face is superimposed as shown by A7'.
- Sticker information is a red-faced egg, a blush, or a blush in a five-official sticker type.
- the sticker information is a red-faced egg, a blush, or a blush in a five-official sticker type.
- the receiving unit 24 is further configured to: receive a description file of the second media information corresponding to the key information that is pushed by the server.
- the description file includes: a location of the second media information relative to the first media information, and a display time of the second media information.
- the synthesizing unit 24 is further configured to perform video synthesis on the first media information and the second media information according to the description file, to display the specified file in the description file.
- the second media information is displayed at a position of the first media information specified by the description file within a time.
- the synthesizing unit 24 includes two specific implementations:
- the first specific implementation is: when the change of the expression or the change of the user action is performed, acquiring a corresponding feature detection result, and the second media information is according to the feature detection result and the description file of the second media information And configuring video synthesis with the first media information, and displaying the second media information in a location specified by the first media information and within a specified time point or time period.
- the second specific implementation is: when the text information is responsive, the second media information is video-combined with the first media information according to the configuration of the description file of the second media information, and the second media information is displayed. At a location specified by the first media information and a specified time point or time period.
- the second multimedia information includes at least one of the following categories:
- the second type of sticker information is triggered by the exclusion of the expression change or the user action change.
- the synthesizing unit 24 is further configured to:
- the synthesizing unit is further configured to:
- the above terminal may be an electronic device such as a PC, and may also be a portable electronic device such as a PAD, a tablet computer, a laptop computer, or an intelligent mobile terminal such as a mobile phone, and is not limited to the description herein;
- the server may be configured by a cluster system, and is integrated into one or each unit function split electronic device for realizing each unit function, and both the terminal and the server include at least a database for storing data and processing for data processing. , or include settings on the server Internal storage medium or independently set storage medium.
- a microprocessor for the processor for data processing, a microprocessor, a central processing unit (CPU), a digital signal processor (DSP, Digital Singnal Processor) or programmable logic may be used when performing processing.
- An FPGA Field-Programmable Gate Array
- An FPGA Field-Programmable Gate Array
- the operation instruction may be a computer executable code, and the operation instruction is used to implement the information processing method in the foregoing embodiment of the present application.
- the apparatus includes a processor 41, a storage medium 42, and at least one external communication interface 43; the processor 41, the storage medium 42, and the external communication interface 43 are all connected by a bus 44.
- the embodiment of the present application is as follows:
- the text information "A red fire” identified by A2 is included in the video information.
- the sticker information “Red Fire” as indicated by A2’ is manually selected from the server material library through the terminal interaction with the server multiple times, and then the sticker information is attached to the video information that has been recorded.
- FIG. 6 shows another application scenario of the prior art. Specifically, the text information “boyfriend” identified by A3 is included in the video information. After the video information recording is completed, the sticker information "boyfriend” as identified by A3' is added.
- the sticker information is the interaction with the server through the terminal multiple times, from the server element
- the material library is manually selected, and then the sticker information is attached to the video information that has been recorded. This kind of processing is very cumbersome and requires multiple user interactions.
- the subsequent search for the sticker is not necessarily what the user really needs. Even if the user really needs it, the user needs to manually add the video information that has already been recorded manually, for example, The sticker moves to the appropriate location of the video information and so on.
- the existing video processing technology is: the application (APP) will provide some fixed stickers, and then the user first records the video. After recording, the user selects the sticker material library and selects that he thinks relevant. The material, then through complex interactions, determines when and how long each sticker is added. And some APPs allow the sticker to move, then press and hold the sticker, and drag to decide which specific location to move to. The consequence is that multiple tedious interactions between the terminal and the server are required, the processing efficiency is low, and the video recording is manually selected after the video recording is completed. The sticker is finally synthesized, and the video processing cost is high, which wastes time and does not necessarily meet the user's needs.
- APP application
- the embodiment of the present application is a video related real-time motion sticker scheme.
- the face recognition and positioning mechanism, the expression recognition mechanism and the video synthesis processing mechanism of the present application can select a sticker information related to the video material in a pile of materials without requiring the user to perform complicated operations, but select a sticker information.
- the corresponding place can be seen.
- the corresponding sticker information appears, which can be called in the corresponding specified position and specified time point in the process of video recording.
- Superimpose the corresponding sticker information as shown in Figure 7-12.
- the video information as shown in FIG. 7 includes the text information “eat not fat” identified by A4, which is sent to the server as key information.
- the matching sticker information obtained based on the key information is identified as A4'.
- the video as shown in FIG. 8 includes the text information "boyfriend” identified by A5, which is sent as a key information to the server.
- the matching sticker information obtained based on the key information is identified as A5'.
- B1 is used to identify the control button during video recording
- B2 is used to identify the playback button after the video recording is over.
- a video is synthesized by playing back the sticker information and the video at a suitable position and time point during video recording.
- FIG. 9 a video is synthesized by playing back the sticker information and the video at a suitable position and time point during video recording.
- the corresponding voice information is played in the recorded video information.
- the text information of the corresponding voice can be displayed on the video interface.
- the composite sticker information displayed on the video interface is displayed in the form of a scroll and the dynamic sticker effect is displayed.
- the sticker information and the video are synthesized at a suitable position and time point, and then the video is played back, wherein when the corresponding recorded video information is played, the voice is “a lot of year-end awards”.
- the text interface of the corresponding voice can be displayed on the video interface.
- the composite sticker information is also displayed on the video interface, and the "year-end prize" is displayed with the dynamic sticker effect, and the indicator of the currency unit is matched. For example, ⁇ , combine it with the text "A lot of year-end awards.”
- sticker shapes can be obtained by recognizing the facial expression or the user's motion.
- the user action and the voice can be combined.
- the user action can be a happy blink of the user.
- other sticker information such as A6 may also be displayed on the video interface during the "happy blink” period.
- the eyes of the logo become two ⁇ ".
- the user action can also be a snap. By this user action, "eyes become two ⁇ " as indicated by A6' in Fig. 11 or "year-end prizes” as shown in Fig. 10 are displayed.
- FIG. 12 shows another application example using the embodiment of the present application.
- other sticker shapes can also be obtained by recognizing facial expressions.
- the voice played in the corresponding recorded video information is "I am so beautiful" as the A7 logo
- the position of the cheek of the face is recognized, and the position of the cheek of the face is superimposed as shown by A7'.
- Sticker information is a red-faced egg, a blush, or a blush in a five-official sticker type.
- the video interface when there is a "I am so beautiful?", there is also a synthetic sticker information on the video interface, and the person's face is flushed.
- sticker information is divided into the following categories:
- Trigger-like stickers A set of stickers that appear when a specific action is detected, and a set of stickers that appear can be either a normal sticker or a five-legged sticker;
- the trigger type sticker and the facial features sticker information are as shown in FIG. 17, and the sticker information is required to be combined with the feature coordinates, and then combined with the recorded video, which needs to be combined with the feature detector in FIG. It has a relationship with the material parser and then with the video synthesizer. This is because the expressions, actions, and facial features change, and the coordinates will change.
- the sticker information is directly combined with the recorded video. That is to say, only the relationship with the video synthesizer in Fig. 17 occurs because the coordinates usually do not change.
- the technical implementation also includes the following:
- the sticker information of each video is taken as part of the material, in the material package, and delivered with the material.
- the material includes, in addition to the sticker information, a description file of the sticker information and the like.
- the dynamic material consists of two parts:
- A) The original form of the sticker information there are three main formats: i) static map; ii) dynamic image interchange format (Gif) map; iii) video.
- the picture type sticker information files such as static maps and dynamic Gif pictures
- transparent images such as Portable Network Graphic Format (PNG) pictures to realize video synthesis
- PNG Portable Network Graphic Format
- For most video class sticker information files (such as video), because it is not transparent, when using video as a material, the resolution of the material video is twice that of the captured video, and half of the pixels are used to represent the sticker.
- the RGB value and the other half of the pixel are used to indicate the transparency of the sticker.
- the video type sticker information is stored in the following manner: RGB and the transparent channel are separated, and the captured video is divided into half of the material RGB and half of the material transparency for storage, as shown in FIG. 16.
- RGB This color mode is a color standard that is obtained by changing the three color channels of red (R), green (G), and blue (B) and superimposing them on each other.
- RGB is the color of the three channels of red, green and blue. This standard includes almost all colors that human vision can perceive.
- the RGB value of the composite video a* RGB value of the video type sticker information + (1-a) * The RGB value of the captured video.
- the description file contains information: i) the position of the center point where the texture appears; ii) the time when the texture appears. Therefore, according to the sticker information and the sticker description file, the sticker can be actively pushed for the terminal to superimpose the appropriate dynamic sticker at a suitable time position of the video being recorded, without the user manually selecting the sticker.
- the appearance time of the texture includes: a) for the one-time playback of the dynamic texture, it is necessary to set when to start; b) for the dynamic texture of the repeated play, the start and end time need to be set.
- the facial features type information includes: i) overhead; ii) eyes; iii) face; iv) mouth; v) nose.
- Trigger class stickers need to be set: trigger conditions, including: i) open mouth; ii) blink; iii) smile; iv) eyebrows.
- the dynamic sticker is drawn, and the purpose of real-time visibility has been achieved.
- a face detection algorithm is also included in the system configuration. It should be noted that the face detection algorithm uses an existing face detection algorithm, and the algorithm itself is not included in the patent.
- the sticker is drawn at an appropriate position according to the result of the face detection, as shown in FIG. 17 as a structural diagram of the entire system. In one embodiment, the modules in Figure 17 are all located on the terminal side.
- the sticker information and the feature coordinates are combined, and then combined with the recorded video, that is, it needs to be related to the feature detection and the material parser, and then the video synthesizer has a relationship, because Expressions, movements, changes in facial features, and coordinates will change.
- the terminal captures the original video through an application (such as a camera application).
- the terminal detects the face region in each frame image of the original video or the characteristics of the user motion in the finder frame through the feature detector, and analyzes Specific feature parameters and their corresponding feature coordinates.
- the feature coordinates include the initial coordinates and the target coordinates after the deformation.
- the sticker is sent to the sticker by the material parser.
- the description file of the information and the sticker information is parsed to obtain information such as the sticker information and its attributes, the superimposed position and the superimposed time point, and the sticker information is in accordance with the feature coordinates, the superimposed position indicated by the description file of the sticker information, and the superimposed time point and the like.
- the video synthesizer combines the sticker information with the original video being captured to generate a video processing result containing the sticker information.
- the sticker information is directly combined with the recorded video, that is: only related to the video synthesizer, because the coordinates usually do not change
- the terminal captures the original video through an application (such as a camera application).
- the terminal receives the sticker information and the sticker information sent after the server matches the text information in the video, and then passes the material parser to the sticker.
- the description file of the information and the sticker information is parsed to obtain information such as the sticker information and its attributes, the superimposed position and the superimposed time point, and the sticker information is synthesized by video according to the superimposed position and the superimposed time point indicated by the description file of the sticker information.
- the device combines the sticker information with the original video being captured to generate a video processing result containing the sticker information.
- the disclosed apparatus and method may be implemented in other manners.
- the device embodiments described above are merely illustrative.
- the division of the unit is only a logical function division.
- there may be another division manner such as: multiple units or components may be combined, or Can be integrated into another system, or some features can be ignored or not executed.
- the coupling, or direct coupling, or communication connection of the components shown or discussed may be indirect coupling or communication connection through some interfaces, devices or units, and may be electrical, mechanical or other forms. of.
- the units described above as separate components may or may not be physically separated, and the components displayed as the unit may or may not be physical units, that is, may be located in one place or distributed to multiple network units; Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
- each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may be separately used as one unit, or two or more units may be integrated into one unit; the above integration
- the unit can be implemented in the form of hardware or Hardware and software functional unit form implementation.
- the foregoing program may be stored in a computer readable storage medium, and the program is executed when executed.
- the foregoing storage device includes the following steps: the foregoing storage medium includes: a mobile storage device, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk.
- ROM read-only memory
- RAM random access memory
- magnetic disk or an optical disk.
- optical disk A medium that can store program code.
- the above-described integrated unit of the present application may be stored in a computer readable storage medium if it is implemented in the form of a software function module and sold or used as a stand-alone product.
- the technical solution of the embodiments of the present application may be embodied in the form of a software product in essence or in the form of a software product stored in a storage medium, including a plurality of instructions.
- a computer device (which may be a personal computer, server, or network device, etc.) is caused to perform all or part of the methods described in various embodiments of the present application.
- the foregoing storage medium includes various media that can store program codes, such as a mobile storage device, a ROM, a RAM, a magnetic disk, or an optical disk.
Abstract
Description
Claims (17)
- 一种信息处理方法,其特征在于,包括:终端获取第一操作,以触发第一媒体信息的采集;终端在采集所述第一媒体信息的过程中检测到符合预设条件的人脸区域内的表情变化或采集框内的用户动作变化时,将检测到的表情变化或用户动作变化的变化量作为关键信息上报给服务器;终端接收所述服务器推送的与所述关键信息对应的第二媒体信息;以及将第一媒体信息和第二媒体信息进行视频合成。
- 根据权利要求1所述的方法,其特征在于,所述关键信息还包括:所述第一媒体信息中的文字信息;以及所述方法还包括:在采集所述第一媒体信息的过程中检测所述文字信息,并将检测的文字信息作为关键信息上报给所述服务器。
- 根据权利要求2所述的方法,其特征在于,将第一媒体信息和第二媒体信息进行视频合成之前,所述方法还包括:终端接收服务器推送的与所述关键信息对应的第二媒体信息的描述文件。
- 根据权利要求3所述的方法,其特征在于,所述描述文件包括:所述第二媒体信息相对于第一媒体信息的位置,以及第二媒体信息的显示时间。
- 根据权利要求4所述的方法,其特征在于,所述将第一媒体信息和第二媒体信息进行视频合成包括:根据所述描述文件将所述第一媒体信息与所述第二媒体信息进行视频合成,以在所述描述文件指定的显示时间内将所述第二媒体信息显示在所述描述文件指定的所述第一媒体信息的位置处。
- 根据权利要求2至5中任一项所述的方法,其特征在于,所述第二多媒体信息包括以下至少一类:由所述表情变化或所述用户动作变化触发显示的第一类贴纸信息;以及由所述文字信息触发显示的第二类贴纸信息。
- 根据权利要求6所述的方法,其特征在于,当所述第二多媒体信息为第一类贴纸信息时,所述将第一媒体信息和第二媒体信息进行视频合成包括:确定所述表情变化或所述用户动作变化的特征初始坐标和特征目标坐标,以根据所述特征目标坐标定位的位置点或者由所述特征初始坐标至所述特征目标坐标确定的位置区域来确定叠加所述第一类贴纸信息的位置;解析收到的所述第一类贴纸信息的描述文件,得到第一类贴纸信息的显示时间;按照所述确定的位置以及所述解析的第一类贴纸信息的显示时间,将第一类贴纸信息与第一媒体信息进行视频合成。
- 根据权利要求6所述的方法,其特征在于,当所述第二多媒体信息为第二类贴纸信息时,所述将第一媒体信息和第二媒体信息进行视频合成包括:解析收到的所述第二类贴纸信息的描述文件,得到第二类贴纸信息相对于第一媒体信息的位置,以及第二类贴纸信息的显示时间;以及按照所述得到的位置和所述显示时间,将第二类贴纸信息与第一媒体信息进行视频合成。
- 一种终端,包括:触发单元,用于获取第一操作,以触发第一媒体信息的采集;检测单元,用于采集所述第一媒体信息的过程中检测到符合预设条件的人脸区域内的表情变化或采集框内的用户动作变化时,将检测到的表情变化或用户动作变化的变化量作为关键信息上报给服务器;接收单元,用于接收所述服务器推送的与所述关键信息对应的第二媒体信息;以及合成单元,用于将第一媒体信息和第二媒体信息进行视频合成。
- 根据权利要求9所述的终端,其特征在于,所述关键信息还包括:所述第一媒体信息中的文字信息;以及所述检测单元,还用于在采集所述第一媒体信息的过程中检测所述文字信息,并将所述文字信息作为关键信息上报给所述服务器。
- 根据权利要求10所述的终端,其特征在于,所述接收单元,进一步用于:接收服务器推送的与所述关键信息对应的第二媒体信息的描述文件。
- 根据权利要求11所述的终端,其特征在于,所述描述文件包括:所述第二媒体信息相对于第一媒体信息的位置,以及第二媒体信息的显示时间。
- 根据权利要求12所述的终端,其特征在于,所述合成单元,进一步用于:根据所述描述文件将所述第一媒体信息与所述第二媒体信息进行视频合成,以在所述描述文件指定的显示时间内将所述第二媒体信息显示在所述描述文件指定的所述第一媒体信息的位置处。
- 根据权利要求10至13中任一项所述的终端,其特征在于,所述第二多媒体信息包括以下至少一类:由所述表情变化或所述用户动作变化触发显示的第一类贴纸信息;以及由所述文字信息触发显示的第二类贴纸信息。
- 根据权利要求14所述的终端,其特征在于,当所述第二多媒体信息为第一类贴纸信息时,所述合成单元,进一步用于:确定所述表情变化或所述用户动作变化的特征初始坐标和特征目标坐标,以根据所述特征目标坐标定位的位置点或者由所述特征初始坐标至所述特征目标坐标确定的位置区域来确定叠加所述第一类贴纸信息的位置;解析收到的所述第一类贴纸信息的描述文件,得到第一类贴纸信息的显示时间;以及按照所述确定的位置以及所述解析的第一类贴纸信息的显示时间,将第一类贴纸信息与第一媒体信息进行视频合成。
- 根据权利要求14所述的终端,其特征在于,当所述第二多媒体信息为第二类贴纸信息时,所述合成单元,进一步用于:解析收到的所述第二类贴纸信息的描述文件,得到第二类贴纸信息相对于第一媒体信息的位置,以及第二类贴纸信息的显示时间;以及按照所述得到的位置和所述显示时间,将第二类贴纸信息与第一媒体信息进行视频合成。
- 一种非易失性存储介质,存储有程序,当所述非易失性存储介质存储的程序被包括一个或多个处理器的计算机设备执行时,可使所述计算机设备执行如权利要求1至8中任一项所述信息处理方法。
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020187026680A KR102135215B1 (ko) | 2016-03-14 | 2017-03-14 | 정보 처리 방법 및 단말 |
JP2018527883A JP2019504532A (ja) | 2016-03-14 | 2017-03-14 | 情報処理方法及び端末 |
US15/962,663 US11140436B2 (en) | 2016-03-14 | 2018-04-25 | Information processing method and terminal |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610143985.2A CN105791692B (zh) | 2016-03-14 | 2016-03-14 | 一种信息处理方法、终端及存储介质 |
CN201610143985.2 | 2016-03-14 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/962,663 Continuation US11140436B2 (en) | 2016-03-14 | 2018-04-25 | Information processing method and terminal |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2017157272A1 true WO2017157272A1 (zh) | 2017-09-21 |
Family
ID=56392673
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2017/076576 WO2017157272A1 (zh) | 2016-03-14 | 2017-03-14 | 一种信息处理方法及终端 |
Country Status (5)
Country | Link |
---|---|
US (1) | US11140436B2 (zh) |
JP (1) | JP2019504532A (zh) |
KR (1) | KR102135215B1 (zh) |
CN (1) | CN105791692B (zh) |
WO (1) | WO2017157272A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107995499A (zh) * | 2017-12-04 | 2018-05-04 | 腾讯科技(深圳)有限公司 | 媒体数据的处理方法、装置及相关设备 |
Families Citing this family (43)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105791692B (zh) * | 2016-03-14 | 2020-04-07 | 腾讯科技(深圳)有限公司 | 一种信息处理方法、终端及存储介质 |
CN106303293B (zh) * | 2016-08-15 | 2019-07-30 | Oppo广东移动通信有限公司 | 视频处理方法、装置及移动终端 |
CN107343220B (zh) | 2016-08-19 | 2019-12-31 | 北京市商汤科技开发有限公司 | 数据处理方法、装置和终端设备 |
CN106210545A (zh) * | 2016-08-22 | 2016-12-07 | 北京金山安全软件有限公司 | 一种视频拍摄方法、装置及电子设备 |
CN106373170A (zh) * | 2016-08-31 | 2017-02-01 | 北京云图微动科技有限公司 | 一种视频制作方法及装置 |
US11049147B2 (en) * | 2016-09-09 | 2021-06-29 | Sony Corporation | System and method for providing recommendation on an electronic device based on emotional state detection |
CN106339201A (zh) * | 2016-09-14 | 2017-01-18 | 北京金山安全软件有限公司 | 贴图处理方法、装置和电子设备 |
CN106341608A (zh) * | 2016-10-28 | 2017-01-18 | 维沃移动通信有限公司 | 一种基于情绪的拍摄方法及移动终端 |
CN106683120B (zh) * | 2016-12-28 | 2019-12-13 | 杭州趣维科技有限公司 | 追踪并覆盖动态贴纸的图像处理方法 |
JP6520975B2 (ja) * | 2017-03-16 | 2019-05-29 | カシオ計算機株式会社 | 動画像処理装置、動画像処理方法及びプログラム |
US10515199B2 (en) * | 2017-04-19 | 2019-12-24 | Qualcomm Incorporated | Systems and methods for facial authentication |
CN107529029A (zh) * | 2017-07-31 | 2017-12-29 | 深圳回收宝科技有限公司 | 一种在检测文件中添加标签的方法、设备以及存储介质 |
KR101968723B1 (ko) * | 2017-10-18 | 2019-04-12 | 네이버 주식회사 | 카메라 이펙트를 제공하는 방법 및 시스템 |
CN108024071B (zh) * | 2017-11-24 | 2022-03-08 | 腾讯数码(天津)有限公司 | 视频内容生成方法、视频内容生成装置及存储介质 |
US10410060B2 (en) * | 2017-12-14 | 2019-09-10 | Google Llc | Generating synthesis videos |
CN108388557A (zh) * | 2018-02-06 | 2018-08-10 | 腾讯科技(深圳)有限公司 | 消息处理方法、装置、计算机设备和存储介质 |
CN108737715A (zh) * | 2018-03-21 | 2018-11-02 | 北京猎户星空科技有限公司 | 一种拍照方法及装置 |
CN113658298A (zh) * | 2018-05-02 | 2021-11-16 | 北京市商汤科技开发有限公司 | 特效程序文件包的生成及特效生成方法与装置 |
CN110163861A (zh) | 2018-07-11 | 2019-08-23 | 腾讯科技(深圳)有限公司 | 图像处理方法、装置、存储介质和计算机设备 |
CN108958610A (zh) * | 2018-07-27 | 2018-12-07 | 北京微播视界科技有限公司 | 基于人脸的特效生成方法、装置和电子设备 |
CN109213932B (zh) * | 2018-08-09 | 2021-07-09 | 咪咕数字传媒有限公司 | 一种信息推送方法及装置 |
CN109388501B (zh) * | 2018-08-31 | 2024-03-05 | 平安科技(深圳)有限公司 | 基于人脸识别请求的通信匹配方法、装置、设备及介质 |
CN109379623A (zh) * | 2018-11-08 | 2019-02-22 | 北京微播视界科技有限公司 | 视频内容生成方法、装置、计算机设备和存储介质 |
CN109587397A (zh) * | 2018-12-03 | 2019-04-05 | 深圳市优炫智科科技有限公司 | 基于人脸检测动态贴图的儿童相机及其动态贴图方法 |
CN109660855B (zh) * | 2018-12-19 | 2021-11-02 | 北京达佳互联信息技术有限公司 | 贴纸显示方法、装置、终端及存储介质 |
CN111695376A (zh) * | 2019-03-13 | 2020-09-22 | 阿里巴巴集团控股有限公司 | 视频处理方法、视频处理装置及电子设备 |
CN110139170B (zh) * | 2019-04-08 | 2022-03-29 | 顺丰科技有限公司 | 视频贺卡生成方法、装置、系统、设备及存储介质 |
CN112019919B (zh) * | 2019-05-31 | 2022-03-15 | 北京字节跳动网络技术有限公司 | 视频贴纸的添加方法、装置及电子设备 |
CN110784762B (zh) * | 2019-08-21 | 2022-06-21 | 腾讯科技(深圳)有限公司 | 一种视频数据处理方法、装置、设备及存储介质 |
CN110782510A (zh) * | 2019-10-25 | 2020-02-11 | 北京达佳互联信息技术有限公司 | 一种贴纸生成方法及装置 |
CN111177542B (zh) * | 2019-12-20 | 2021-07-20 | 贝壳找房(北京)科技有限公司 | 介绍信息的生成方法和装置、电子设备和存储介质 |
US11675494B2 (en) * | 2020-03-26 | 2023-06-13 | Snap Inc. | Combining first user interface content into second user interface |
CN111556335A (zh) * | 2020-04-15 | 2020-08-18 | 早安科技(广州)有限公司 | 一种视频贴纸处理方法及装置 |
KR20210135683A (ko) * | 2020-05-06 | 2021-11-16 | 라인플러스 주식회사 | 인터넷 전화 기반 통화 중 리액션을 표시하는 방법, 시스템, 및 컴퓨터 프로그램 |
CN111597984B (zh) * | 2020-05-15 | 2023-09-26 | 北京百度网讯科技有限公司 | 贴纸测试方法、装置、电子设备及计算机可读存储介质 |
CN113709573B (zh) | 2020-05-21 | 2023-10-24 | 抖音视界有限公司 | 配置视频特效方法、装置、设备及存储介质 |
CN111627115A (zh) * | 2020-05-26 | 2020-09-04 | 浙江商汤科技开发有限公司 | 互动合影方法及装置、互动装置以及计算机存储介质 |
CN111757175A (zh) * | 2020-06-08 | 2020-10-09 | 维沃移动通信有限公司 | 视频处理方法及装置 |
CN111726701B (zh) * | 2020-06-30 | 2022-03-04 | 腾讯科技(深圳)有限公司 | 信息植入方法、视频播放方法、装置和计算机设备 |
KR20230163528A (ko) * | 2021-03-31 | 2023-11-30 | 스냅 인코포레이티드 | 맞춤화가능한 아바타 생성 시스템 |
US11941227B2 (en) | 2021-06-30 | 2024-03-26 | Snap Inc. | Hybrid search system for customizable media |
US11689765B1 (en) * | 2021-12-31 | 2023-06-27 | The Nielsen Company (Us), Llc | Methods and apparatus for obfuscated audience identification |
CN114513705A (zh) * | 2022-02-21 | 2022-05-17 | 北京字节跳动网络技术有限公司 | 视频显示方法、装置和存储介质 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1870744A (zh) * | 2005-05-25 | 2006-11-29 | 冲电气工业株式会社 | 图像合成装置、通信终端、图像通信系统以及聊天服务器 |
CN101453573A (zh) * | 2007-12-04 | 2009-06-10 | 奥林巴斯映像株式会社 | 图像显示装置和照相机、图像显示方法、程序及图像显示系统 |
JP2010066844A (ja) * | 2008-09-09 | 2010-03-25 | Fujifilm Corp | 動画コンテンツの加工方法及び装置、並びに動画コンテンツの加工プログラム |
TW201021550A (en) * | 2008-11-19 | 2010-06-01 | Altek Corp | Emotion-based image processing apparatus and image processing method |
CN102427553A (zh) * | 2011-09-23 | 2012-04-25 | Tcl集团股份有限公司 | 一种电视节目播放方法、系统及电视机和服务器 |
CN105791692A (zh) * | 2016-03-14 | 2016-07-20 | 腾讯科技(深圳)有限公司 | 一种信息处理方法及终端 |
Family Cites Families (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2006211120A (ja) * | 2005-01-26 | 2006-08-10 | Sharp Corp | 文字情報表示機能を備えた映像表示システム |
JP4356645B2 (ja) * | 2005-04-28 | 2009-11-04 | ソニー株式会社 | 字幕生成装置及び方法 |
EP2194509A1 (en) * | 2006-05-07 | 2010-06-09 | Sony Computer Entertainment Inc. | Method for providing affective characteristics to computer generated avatar during gameplay |
US8243116B2 (en) * | 2007-09-24 | 2012-08-14 | Fuji Xerox Co., Ltd. | Method and system for modifying non-verbal behavior for social appropriateness in video conferencing and other computer mediated communications |
US20100257462A1 (en) * | 2009-04-01 | 2010-10-07 | Avaya Inc | Interpretation of gestures to provide visual queues |
JP2013046358A (ja) * | 2011-08-26 | 2013-03-04 | Nippon Hoso Kyokai <Nhk> | コンテンツ再生装置及びコンテンツ再生プログラム |
US20140111542A1 (en) * | 2012-10-20 | 2014-04-24 | James Yoong-Siang Wan | Platform for recognising text using mobile devices with a built-in device video camera and automatically retrieving associated content based on the recognised text |
US9251405B2 (en) * | 2013-06-20 | 2016-02-02 | Elwha Llc | Systems and methods for enhancement of facial expressions |
US20160196584A1 (en) | 2015-01-06 | 2016-07-07 | Facebook, Inc. | Techniques for context sensitive overlays |
US9697648B1 (en) * | 2015-12-23 | 2017-07-04 | Intel Corporation | Text functions in augmented reality |
-
2016
- 2016-03-14 CN CN201610143985.2A patent/CN105791692B/zh active Active
-
2017
- 2017-03-14 WO PCT/CN2017/076576 patent/WO2017157272A1/zh active Application Filing
- 2017-03-14 JP JP2018527883A patent/JP2019504532A/ja active Pending
- 2017-03-14 KR KR1020187026680A patent/KR102135215B1/ko active IP Right Grant
-
2018
- 2018-04-25 US US15/962,663 patent/US11140436B2/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1870744A (zh) * | 2005-05-25 | 2006-11-29 | 冲电气工业株式会社 | 图像合成装置、通信终端、图像通信系统以及聊天服务器 |
CN101453573A (zh) * | 2007-12-04 | 2009-06-10 | 奥林巴斯映像株式会社 | 图像显示装置和照相机、图像显示方法、程序及图像显示系统 |
JP2010066844A (ja) * | 2008-09-09 | 2010-03-25 | Fujifilm Corp | 動画コンテンツの加工方法及び装置、並びに動画コンテンツの加工プログラム |
TW201021550A (en) * | 2008-11-19 | 2010-06-01 | Altek Corp | Emotion-based image processing apparatus and image processing method |
CN102427553A (zh) * | 2011-09-23 | 2012-04-25 | Tcl集团股份有限公司 | 一种电视节目播放方法、系统及电视机和服务器 |
CN105791692A (zh) * | 2016-03-14 | 2016-07-20 | 腾讯科技(深圳)有限公司 | 一种信息处理方法及终端 |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107995499A (zh) * | 2017-12-04 | 2018-05-04 | 腾讯科技(深圳)有限公司 | 媒体数据的处理方法、装置及相关设备 |
CN107995499B (zh) * | 2017-12-04 | 2021-07-23 | 腾讯科技(深圳)有限公司 | 媒体数据的处理方法、装置及相关设备 |
Also Published As
Publication number | Publication date |
---|---|
CN105791692A (zh) | 2016-07-20 |
CN105791692B (zh) | 2020-04-07 |
KR102135215B1 (ko) | 2020-07-17 |
KR20180112848A (ko) | 2018-10-12 |
US20180249200A1 (en) | 2018-08-30 |
JP2019504532A (ja) | 2019-02-14 |
US11140436B2 (en) | 2021-10-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2017157272A1 (zh) | 一种信息处理方法及终端 | |
US11321385B2 (en) | Visualization of image themes based on image content | |
CN110612533B (zh) | 用于根据表情对图像进行识别、排序和呈现的方法 | |
TWI253860B (en) | Method for generating a slide show of an image | |
WO2020063319A1 (zh) | 动态表情生成方法、计算机可读存储介质和计算机设备 | |
CN105190480B (zh) | 信息处理设备和信息处理方法 | |
US8416332B2 (en) | Information processing apparatus, information processing method, and program | |
EP3195601B1 (en) | Method of providing visual sound image and electronic device implementing the same | |
US20220174237A1 (en) | Video special effect generation method and terminal | |
WO2022095757A1 (zh) | 图像渲染方法和装置 | |
US11394888B2 (en) | Personalized videos | |
US20180352191A1 (en) | Dynamic aspect media presentations | |
WO2018177134A1 (zh) | 用户生成内容处理方法、存储介质和终端 | |
CN115529378A (zh) | 一种视频处理方法及相关装置 | |
US20170061642A1 (en) | Information processing apparatus, information processing method, and non-transitory computer readable medium | |
JP2010251841A (ja) | 画像抽出プログラムおよび画像抽出装置 | |
US20230043683A1 (en) | Determining a change in position of displayed digital content in subsequent frames via graphics processing circuitry | |
JP6166070B2 (ja) | 再生装置および再生方法 | |
CN115225756A (zh) | 确定目标对象的方法、拍摄方法和装置 | |
JP2014110469A (ja) | 電子機器、画像処理方法、及びプログラム | |
JP2017211995A (ja) | 再生装置、再生方法、再生プログラム、音声要約装置、音声要約方法および音声要約プログラム | |
US20230326095A1 (en) | Overlaying displayed digital content with regional transparency and regional lossless compression transmitted over a communication network via processing circuitry | |
US20230326094A1 (en) | Integrating overlaid content into displayed data via graphics processing circuitry and processing circuitry using a computing memory and an operating system memory | |
CN113873135A (zh) | 一种图像获得方法、装置、电子设备及存储介质 | |
CN117688196A (zh) | 图像推荐方法、相关装置 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WWE | Wipo information: entry into national phase |
Ref document number: 2018527883 Country of ref document: JP |
|
ENP | Entry into the national phase |
Ref document number: 20187026680 Country of ref document: KR Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 1020187026680 Country of ref document: KR |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 17765814 Country of ref document: EP Kind code of ref document: A1 |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 17765814 Country of ref document: EP Kind code of ref document: A1 |