CN110300275B

CN110300275B - Video recording and playing method, device, terminal and storage medium

Info

Publication number: CN110300275B
Application number: CN201810236991.1A
Authority: CN
Inventors: 王文涛; 王清; 冯驰伟; 师凯凯; 赵亮
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2018-03-21
Filing date: 2018-03-21
Publication date: 2021-09-21
Anticipated expiration: 2038-03-21
Also published as: CN110300275A

Abstract

The invention discloses a video recording and playing method, a video recording and playing device, a video recording and playing terminal and a video recording and playing storage medium, and belongs to the technical field of computers. The method comprises the following steps: receiving a video playing instruction, wherein the video playing instruction is used for playing a target video file; according to the capability information of the player, acquiring target image data corresponding to the capability information from image data of an image and image effect data of the image contained in the target video file, wherein the image effect data of each frame of image is used for indicating a transparent area of the corresponding image during playing; playing the video based on the target image data; wherein the capability information is used for indicating whether the player has the capability of playing the transparent video. The player which can not play the transparent video can not simultaneously display two images, but only plays the video based on the target image data, so that the target video file can be compatibly played in different players.

Description

Video recording and playing method, device, terminal and storage medium

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a method, an apparatus, a terminal, and a storage medium for recording and playing a video.

Background

With the development of computer technology, when recording a video file, the video file is often required to be subjected to transparentization processing so as to highlight the target graphic identifier of the image and transparently display a background area; the transparent display refers to an image which normally displays the target graphic identifier of each frame of image in a display area and shows out the display area in a background area of the target graphic identifier. For example, when the terminal plays a video file in the current session interface, only the target graphic identifier is displayed, and the background image of the current session interface is displayed in other areas.

In the related art, the video playing process may be: the terminal reads image data of a plurality of frames of images and image effect data associated with each frame of image from trak data blocks of a video file, wherein the image effect data is used for indicating a transparent area of the image when the image is played. And for each frame of image, the terminal displays the frame of image on the current session interface according to the specified playing logic of the player, and displays the background image of the session interface in the transparent area of the frame of image based on the indication of the image effect data. Wherein the designated playback logic is logic to transparently display the image based on the indication of the image effect data.

In the process of implementing the invention, the inventor finds that the related art has at least the following problems:

in some specific players, the actual playing effect of the playing method is poor in some specific players due to the fact that the above technical process defaults to simultaneously display two images in a display area based on the image data of the image and the image effect data of the image respectively, wherein one image is an image of a video picture, and the other image is an image generated based on the image effect data.

Disclosure of Invention

The embodiment of the invention provides a video recording and playing method, a video recording and playing device, a video recording and playing terminal and a video recording and playing storage medium, which can solve the problem of poor playing effect in the related technology. The technical scheme is as follows:

in a first aspect, a video playing method is provided, where the method includes:

receiving a video playing instruction, wherein the video playing instruction is used for playing a target video file;

according to the capability information of the player, acquiring target image data corresponding to the capability information from image data of an image and image effect data of the image contained in the target video file, wherein the image effect data of each frame of image is used for indicating a transparent area of the corresponding image during playing;

playing a video based on the target image data;

wherein the capability information is used to indicate whether the player has the capability of playing transparent video, and different capability information corresponds to different types of data of the image.

In a second aspect, a video recording method is provided, the method comprising:

when a video recording instruction is received, acquiring image data of a plurality of frames of images of a target graphic identifier in real time;

acquiring image effect data of the multiple frames of images, wherein each frame of image effect data is used for indicating a transparent area of the corresponding image when the corresponding image is played;

when a recording completion instruction is received, generating a target video file based on the image data of the multiple frames of images and the image effect data of the multiple frames of images, wherein a first type data block in the target video file stores the image data of the multiple frames of images, and a second type data block in the target video file stores the image effect data of the multiple frames of images;

wherein the data block types of the first type data block and the second type data block are different.

In a third aspect, a video playing apparatus is provided, the apparatus comprising:

the receiving module is used for receiving a video playing instruction, and the video playing instruction is used for playing a target video file;

the acquisition module is used for acquiring target image data corresponding to the capability information from image data of an image and image effect data of the image contained in the target video file according to the capability information of the player, wherein the image effect data of each frame of image is used for indicating a transparent area of the corresponding image during playing;

the playing module is used for playing videos based on the target image data;

In a fourth aspect, there is provided a video recording apparatus, the apparatus comprising:

the acquisition module is used for acquiring the image data of the multi-frame image of the target graphic identifier in real time when receiving a video recording instruction;

the acquisition module is further configured to acquire image effect data of the multiple frames of images, where each frame of image effect data is used to indicate a transparent area of a corresponding image when the corresponding image is played;

the generating module is used for generating a target video file based on the image data of the multiple frames of images and the image effect data of the multiple frames of images when a recording completion instruction is received, wherein a first type data block in the target video file stores the image data of the multiple frames of images, and a second type data block in the target video file stores the image effect data of the multiple frames of images;

In a fifth aspect, a terminal is provided, where the terminal includes a processor and a memory, and the memory stores at least one instruction, where the instruction is loaded and executed by the processor to implement the operation performed by the video playing method according to the first aspect or the operation performed by the video recording method according to the second aspect.

In a sixth aspect, a computer-readable storage medium is provided, in which at least one instruction is stored, and the instruction is loaded and executed by a processor to implement the operation performed by the video playing method according to the first aspect or the operation performed by the video recording method according to the second aspect.

The technical scheme provided by the embodiment of the invention has the following beneficial effects:

according to the method provided by the embodiment of the invention, when the target video file is played, the target image data corresponding to the capability information of the player is acquired from the image data and the image effect data of the image contained in the target video file, so that the player incapable of playing the transparent video cannot acquire the image effect data from the playing source. For a player which does not have transparent video playing capability, two images cannot be displayed simultaneously, one is an image of a video picture, and the other is an image generated based on image effect data, and video playing is performed only based on the target image data, so that a target video file can be played compatibly in different players.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a schematic illustration of an implementation environment provided by an embodiment of the invention;

fig. 2 is a flowchart of a video recording method according to an embodiment of the present invention;

fig. 3 is a flowchart of a recording link and a rendering link according to an embodiment of the present invention;

FIG. 4 is a diagram of an MPEG-4 file structure according to an embodiment of the present invention;

fig. 5 is a flowchart of a video playing method according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of a target video file transmission according to an embodiment of the present invention;

fig. 7 is a flowchart of a video recording and playing method according to an embodiment of the present invention;

FIG. 8 is a schematic diagram of a session interface provided by an embodiment of the invention;

FIG. 9 is a schematic diagram of a session interface provided by an embodiment of the invention;

FIG. 10 is a schematic illustration of a session interface provided by an embodiment of the invention;

FIG. 11a is a schematic diagram of a session interface provided by an embodiment of the invention;

FIG. 11b is a diagram illustrating an example of a session interface according to an embodiment of the present invention;

FIG. 12 is a schematic diagram illustrating comparison between effects of a session interface provided by an embodiment of the present invention;

FIG. 13 is a diagram illustrating an example of a conversational interface provided by an embodiment of the invention;

fig. 14 is a schematic structural diagram of a video playback device according to an embodiment of the present invention;

fig. 15 is a schematic structural diagram of a video recording apparatus according to an embodiment of the present invention;

fig. 16 is a schematic structural diagram of a terminal according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Fig. 1 is a schematic diagram of an implementation environment provided by an embodiment of the present invention, where the implementation environment includes: the terminal 101, where a play function is set in advance on the terminal 101, and the play function may be implemented by a player of the terminal 101 or a play plug-in of a certain application program, and the terminal 101 may play a video file through the play function.

In some scene requirements, the terminal 101 may be configured with a configuration file capable of implementing playing of a transparent video, that is, when playing a video file, only a graphic identifier in each frame of image of the video file is displayed, and a background portion except the graphic identifier in the image is not displayed, so as to achieve an effect of transparentizing the background portion. For example, the video is transparently played on the session interface, so that only the graphic identifier is displayed in the session interface, thereby greatly improving the space utilization rate of the session interface.

In a possible scenario, the implementation environment may further include a terminal 102, where the terminal 102 may establish a communication connection with the terminal 101, and after the terminal 101 records a video file, the video file is sent to the terminal 102, and the terminal 102 may play the video file through a playing function.

When the terminal 102 can be configured with a configuration file capable of playing a transparent video, the terminal 102 can play the video to achieve a background transparent playing effect. If the terminal 102 does not configure the configuration file, the background-transparent playing effect is not displayed during playing.

Note that the video file includes image data of a plurality of frames of images and image effect data of each frame of image. The multi-frame image includes actual picture content to be played. Each frame of image may correspond to a frame of image effect data indicating a transparent area of the corresponding image when played back, and not including actual picture content. The number of the terminals 102 may be one or more, and the embodiment of the present invention is not limited in this respect. For example, the terminals 101 and 102 may be any Device such as a mobile phone terminal, a PAD (Portable Android Device) terminal, or a computer terminal. The player may be any application that can play video files.

Next, a specific process of recording a video file capable of playing a background transparency effect will be described. Fig. 2 is a flowchart of a video recording method according to an embodiment of the present invention. The execution subject of the embodiment of the present invention is a terminal, and referring to fig. 2, the method includes:

201. when a video recording instruction is received, the terminal acquires the image data of the multi-frame image of the target graphic identifier in real time.

The target graphic identifier may be a graphic provided by a recording function and mainly including a target object, and specifically, the target object may be a designated portion in a virtual object or a portrait image. In the embodiment of the invention, when a user needs to record a video, the user can trigger the terminal to record the video through the trigger operation. When the terminal detects the triggering operation of the user, a video recording instruction is triggered, when the terminal receives the video recording instruction, the target user is recorded, a target graphic identifier corresponding to the target user is determined, and image data of multi-frame images of the target graphic identifier is generated according to the target graphic identifier.

Wherein the triggering operation is used for triggering the terminal to start recording the video. When the user and other users have a conversation, the terminal may display a record button on the current conversation interface, and the trigger operation may be an operation of clicking the record button by the user. When the terminal detects that the recording button is clicked, the terminal may display icons of a plurality of virtual objects on the current session interface, for example, the plurality of virtual objects may be cartoon characters such as penguins and pandas. The user can select one of the virtual objects as a target object, the terminal obtains the selected target object, and the target graphic identifier is obtained by combining the behavior change information of the user. Or, the user may directly use the figure of the designated part in the portrait image of the target user as the target figure identifier, or the terminal may further combine the portrait image and the virtual object to obtain a target figure identifier.

Based on several possibilities of the target graphical identification, this step can be implemented in the following three ways.

In the first mode, the terminal acquires behavior change information of a target user in real time, synchronizes the behavior change information to the target graphic identifier in real time, and generates image data of a multi-frame image of the target graphic identifier.

In the embodiment of the invention, the terminal can simulate the behavior change of the target user through the virtual object. Wherein the behavior change information includes expression change information and/or limb action change information of the target user. When the terminal starts to record, the target user can make various expression changes or various limb actions in front of the camera of the terminal, the terminal can acquire behavior change information of the target user through the camera, and a target graphic identifier corresponding to the behavior change information is acquired based on the behavior change information to describe the behavior change condition of the target user; and the terminal generates the image data of the multi-frame image of the target graphic identifier according to the change process of the target graphic identifier in the shooting time. Wherein the image comprises the target graphical identifier and a background area around the target graphical identifier.

The terminal can synchronize the behavior change information to the three-dimensional model, then convert the three-dimensional model into a two-dimensional target graphic identifier, and also can directly convert the behavior change information into a two-dimensional target graphic identifier. Specifically, the step of acquiring, by the terminal, the image data of the multi-frame image of the target graphic identifier may include the following two cases.

In the first case, the terminal synchronizes the behavior change information to the three-dimensional model of the target graphic identifier to simulate the behavior change of the target user, and acquires the image data of multiple frames of two-dimensional images of the three-dimensional model based on the synchronization process of the three-dimensional model of the target graphic identifier. The behavior change information may be expression change information of the user.

In the embodiment of the invention, the three-dimensional model is a three-dimensional model of a virtual object, the terminal can perform face recognition on a target user, detect the change of the facial features of the target user in real time, and control the changes of the facial features of the three-dimensional model to be the same as the changes of the facial features of the target user, thereby achieving the effect of simulating the behavior change of the target user. The step of acquiring the image data of the multi-frame two-dimensional image of the three-dimensional model based on the synchronization process of the three-dimensional model by the terminal may be: in the synchronization process, the terminal maps the three-dimensional image of the three-dimensional model to the two-dimensional plane every other preset time length to obtain image data of one frame of two-dimensional image.

The terminal can determine the pixel value and the transparency value of the three-dimensional image along the mapping direction based on the specified resolution and sampling frequency, and map the three-dimensional image onto the two-dimensional plane. In the three-dimensional image, during actual technical processing, the terminal may set the transparency value and the pixel value of each pixel point in a background region other than the virtual object to be the same, and generally, may set the pixel value of the background region to be a pixel value corresponding to white, and set the transparency value of each pixel point in the background region to be a designated transparency value, where the designated transparency value is used to indicate that the pixel point is transparent, and the designated transparency value may be 0, so that during mapping, the terminal maps the three-dimensional image into a frame of image including a target graphic identifier and a white background region.

During face recognition, the terminal may detect a plurality of first feature points for representing facial features, and represent current features of facial features of the target user through the plurality of first feature points, for example, represent a smile expression that a mouth is currently two-sided mouth with upwarped corners through the first feature points located in the mouth. The terminal acquires a plurality of first feature points every preset time, and determines the current features of facial features of a target user based on the first feature points; and the terminal adjusts the five sense organs of the virtual object in the three-dimensional model to be consistent with the five sense organs of the face according to the current characteristics of the five sense organs of the face.

The terminal can store the corresponding relation between the facial features and the virtual object features, directly selects the facial features of the virtual object corresponding to the current features of the facial features from the corresponding relation between the facial features and the virtual object features according to the current features of the facial features, and adjusts the current facial features of the virtual object to the selected facial features. Or, the terminal may further scale the first feature point based on the first feature point according to a scaling ratio between the face and the face of the virtual object, obtain a second feature point after scaling, and draw the five sense organs of the virtual object in the three-dimensional model based on the second feature point.

The terminal can send out detection rays through the dot matrix projector and the TOF (Time Of Flight) camera to identify the face Of the target user, the detection rays can be infrared light, and the terminal can also realize the face identification through at least two cameras. The preset duration and the mapping direction may be set based on user needs, for example, the preset duration may be 0.5 ms, 1 ms, and the like; the mapping direction may be directly in front of the virtual object face.

It should be noted that the terminal can acquire the dynamic changes of the facial features of the target user in three dimensions to obtain the facial three-dimensional data, and synchronize the dynamic changes of the facial features to the three-dimensional model of the virtual object, so that the behavior changes of the target user in the three-dimensional space can be captured more accurately, the target graphic identification is more vivid and vivid, and the accuracy of video recording is greatly improved. Moreover, the human face is simulated through the virtual object, the interestingness of video recording is improved, and more interesting user experience is brought to the user.

In the second case, the terminal directly synchronizes the behavior change information, which may be the limb motion change information, to the two-dimensional image.

The terminal can store the action characteristics of the limb action and the image of the target graphic identifier corresponding to the action characteristics in a pre-association manner. The method comprises the following steps: the terminal collects the current action characteristics of the limb action of the target user, determines a two-dimensional image corresponding to the current action characteristics from the corresponding relation between the action characteristics of the limb action and the image of the target graphic identifier, and takes the image data of the two-dimensional image as the image data of the image of the target graphic identifier. Of course, the terminal may collect the current motion characteristics at regular intervals based on time, and obtain the two-dimensional image of the target graphic identifier based on the current motion characteristics, thereby obtaining the multi-frame image of the target graphic identifier.

In the two-dimensional image of the target graphic identifier, the transparency value of each pixel point in the background region may be preset to the designated transparency value.

And in the second mode, the terminal shoots the target user to obtain the image data of the plurality of frames of portrait images of the target user, and directly uses the image data of the portrait images as the image data of one frame of image of the target graphic identifier.

In this step, the terminal may further shoot the target user through the camera to obtain a portrait image of the target user. At this time, the target figure identifier may be a figure of a designated portion of the subject in the portrait image. Therefore, for each frame of image data of the portrait image, the terminal directly takes the image data of the portrait image as one frame of image of the target graphic identifier. The designated part can be the face of the target user or the limb parts such as the arms and the legs.

The terminal can firstly perform image recognition on the portrait image, detect whether the portrait image includes the figure of the specified part, and when the portrait image includes the figure of the specified part, the terminal determines the image data of the portrait image as the image data of one frame of image identified by the target figure; otherwise, the terminal discards the portrait image.

When the portrait image is used as the image of the target graphic logo, the background area of the image, that is, the area of the portrait image corresponding to the environment where the image subject is located. The transparency values and pixel values of the target graphic identification and the background area in the image can be obtained based on actual shooting. The terminal directly takes the graph of a certain entity part of the human body as the target graph mark, so that the reality of the video is greatly increased.

And in the third mode, the terminal shoots the target user to obtain the image data of the multi-frame portrait image of the target user, and the face part in the portrait image is added to the target graphic identifier to generate the image data of the multi-frame image of the target graphic identifier.

In the embodiment of the invention, the terminal can also combine the portrait image with the image of the virtual object to obtain the image of the target graphic identifier. Specifically, the terminal may shoot the target user through the camera to obtain image data of a portrait image of the target user, perform image recognition on the portrait image, recognize a face portion of the target user, add the face portion to the two-dimensional image of the virtual object in an image editing manner, and use the image data of the two-dimensional image of the virtual object after the addition as image data of an image identified by a frame of target graphics.

The image editing process may be: the terminal cuts the portrait image along the face part of the target user and the boundary area around the face to obtain the face image; and the terminal acquires the image of the virtual object of the three-dimensional model, identifies the face area in the image of the virtual object, and attaches the face image to the face area of the face in the image of the virtual object, so as to obtain an image of a frame of target image identifier. Of course, the terminal may capture a frame of portrait image of the target user at regular intervals, and obtain a plurality of frames of images of the target graphic identifier in the image editing manner.

In the image of the target graphic identifier obtained by combining the image of the virtual object and the portrait image, the transparency value of each pixel point in the background image may be set to be the designated transparency value as in the first mode.

It should be noted that, during actual technical processing, in the process of acquiring an image of a target graphic identifier by a terminal, the terminal may complete acquisition of video data through a rendering link and a recording link, where the rendering link is used to convert behavior change information of a target user or image data of a portrait image of the target user into image data of the image of the target graphic identifier, and the recording link is used to add the image data of the image of the target graphic identifier to the video data, so as to package the video data and subsequently generate a target video file. Further, the terminal may also synchronously execute the rendering link and the recording link in a manner that the rendering link and the recording link share a pixel buffer (pixelbbuffer). Specifically, the execution flow of the recording link and the rendering link is as shown in fig. 3, when the terminal acquires image data of an image of a frame of target graphic identifier through the rendering link, the frame of image is stored in the pixel buffer area, and meanwhile, when the image data of the image of the target graphic identifier exists in the pixel buffer area, the terminal synchronously executes the recording link, and adds the image data of the recorded image to the video data to be packaged.

The terminal synchronously executes the two link processes in a space sharing mode, so that the terminal can acquire the image data of the image added with the target graphic identifier, the processing speed of the terminal is greatly optimized, and the video recording efficiency is improved.

It should be noted that, the terminal combines the portrait image of the target user with the virtual image of the virtual object to make a more personalized target graphic identifier, and the target graphic identifier replaces the image of the target user, so that the recorded video not only retains the individual characteristics, but also adds many interesting factors, so that the recorded video is unique as the exclusive video of the target user, and attracts more users' attention.

202. And the terminal acquires the image effect data of the multi-frame image.

In the embodiment of the invention, each frame of image effect data is used for indicating the transparent area of the corresponding image during playing. The transparent area may be a background area around the target graphic identifier in the image, and the other areas in the image may be non-transparent areas, that is, areas occupied by the target graphic identifier in the image. The terminal can obtain the transparency value corresponding to the target graphic identifier and the transparency value corresponding to the background area in the image, and the transparency value corresponding to the target graphic identifier and the transparency value corresponding to the background area form the image effect data.

For each frame of image, because the mode of obtaining the background area of the image is different, the mode of obtaining the image effect data by the terminal is also different, and specifically, the step can be realized by the following two modes.

In the first manner, when the transparency value of each pixel point in the background region of the image is the designated transparency value, that is, the image data of the image obtained in the first manner and the third manner in step 201, the terminal may extract the transparency value of each pixel point in the image, and form the transparency value of each pixel point into the image effect data.

In the first and third manners of step 201, the target icon is a preset target icon, the background area is also preset, and the transparency value is already set to a designated transparency value, so that the image effect data generated based on the above manners can indicate the transparent area of the image through the designated transparency value. Wherein the terminal sets the production parameters of the image effect data to be the same as the image.

In a second manner, when the image data of the image is the image data of the image obtained in the second manner in step 201, for each frame of image, the terminal performs image recognition on the target graphic identifier in the image, identifies the target graphic identifier in the image, that is, the graphic corresponding to the target user in the portrait image, determines a background area in the image other than the target graphic identifier as a transparent area, and determines an area occupied by the target graphic identifier in the image as a non-transparent area. And the terminal sets the transparency values of the pixel points in the transparent area as specified transparency values, does not process the transparency values of the pixel points in the nontransparent area, extracts the transparency values of each pixel point of the image and forms the image effect data.

In a possible design, the terminal may further set an edge region between the transparent region and the non-transparent region, and set a transparency value of each pixel point in the edge region based on the transparency values of the transparent region and the non-transparent region, so that the transparency value of each pixel point in the edge region is between the transparency value of the transparent region and the transparency value of the non-transparent region, so that the transparent region and the non-transparent region of the image have a transitional effect.

The image data of the image and the image effect data of the image have the same production parameters, and the production parameters include sampling frequency, resolution and/or time stamp. The terminal can adopt the sampling frequency with the image data of this image, resolution ratio generation image effect data to when guaranteeing follow-up broadcast, improve the transparent area's that image effect data instructs accuracy, and, the terminal keeps unanimous with the time stamp of the image effect data of this image and this image, through this time stamp, has guaranteed accurate corresponding relation between every frame image and every frame image effect data, has improved the accuracy of broadcast.

It should be noted that, during the actual technical processing, the rendering link is further configured to obtain image data of an image of the target graphic identifier in real time to produce image effect data, and the recording link is further configured to add the image effect data to the recording data to be packaged. Similarly, as shown in fig. 3, when the terminal generates image effect data, the terminal may also synchronously execute the generation process of the image effect data in the rendering link and the recording link in a manner of sharing a pixel buffer, so that the terminal may add the image effect data while acquiring, thereby optimizing the generation process of the image effect data by the terminal and improving the efficiency of video recording.

203. And when receiving a recording completion instruction, the terminal generates a target video file based on the image data of the multi-frame image and the image effect data of the multi-frame image.

In the embodiment of the invention, the first type data block in the target video file stores the image data of the multi-frame image, and the second type data block in the target video file stores the image effect data of the multi-frame image; the first type of data blocks and the second type of data blocks are of different data block types. For example, the packaging format of the target video file may be an MPEG-4(Moving Picture Experts Group-4) format, as shown in fig. 4, the video file of the MPEG-4 format may include trak data blocks and udta data blocks, the first type data blocks being trak data blocks of the target video file, the second type data blocks being udta data blocks of the target video file. The trak data block can be read by players with different capability information, while the udta data block can only be read by data blocks with the capability of playing transparent video. Of course, the second type data block may also be other data blocks in the target video file that are not read by the data block without the capability of playing the transparent video and are read by the data block with the capability of playing the transparent video, which is not specifically limited in the embodiment of the present invention.

When the terminal receives the recording completion instruction, the terminal encapsulates the image data of the multi-frame image in the first type data block, encapsulates the image effect data of the multi-frame image in the second type data block, and generates the target video file by the first type data block and the second type data block.

The terminal can also package the image and the information of the production parameters, producer and the like of the image effect data on the appointed byte positions of the first type data block and the second type data block respectively. Certainly, the terminal may also add some other information based on the user requirement, for example, text information such as a personal identifier and a copyright identifier customized by the target user, which is not limited in the embodiment of the present invention.

In a possible design, the terminal may further acquire voice data of the target user in a process of acquiring a plurality of frames of images of the target graphic identifier in real time, so that the terminal may store the voice data in the first type data block of the target video file when generating the target video file.

Of course, when the terminal collects the voice data, the time stamp corresponding to the voice data can be recorded. Specifically, when the terminal acquires a frame of image of the target graphic identifier in real time, primary voice data is synchronously acquired, so that the time stamp of the acquired voice data is the same as that of the frame of image, and the voice data corresponds to the frame of image one to one. The terminal can correspondingly encapsulate the timestamp of the voice data at the specified byte position of the first type data block, so that during subsequent playing, the player can synchronously play the voice data and a certain frame of image corresponding to the voice data based on the timestamp. The terminal can also take the collected voice data as original voice data, change the voice of the original voice data through the prestored sound change parameters, take the sound change processed voice data as the voice data corresponding to each frame of image, and play the voice data after sound change.

In one possible design, the terminal may add a special effect to the image based on the voice data, and the process may be: the terminal carries out voice recognition on the collected voice data of the target user, determines the state information of the current behavior of the target user according to the recognition result, and adds a graphic special effect corresponding to the state information on the image of the target graphic identifier based on the state information. For example, when the terminal recognizes that the target user utters "not", the "not! "or" NO! And the like characters and swaying graphics and special effects. When the terminal recognizes that the target user makes a crying sound (but may not actually tear), the terminal may add a special graphic effect of rolling a tear drop on the eye position in the image of the target graphic identifier in an exaggerated manner.

It should be noted that, the terminal may also collect voice data of the target user when collecting the image of the target graphic identifier, which greatly increases the information amount of the recorded video. In addition, the current state of the target user is sensed from the visual angle and the auditory angle through the image and the voice data, and optimization processing such as special effects is carried out on the image based on the collected voice data, so that the recorded video data is closer to the actual state of the target user, and the accuracy of video recording is improved. Furthermore, the terminal can add an exaggerated graphic special effect on the image and change voice based on the original voice data, so that the information of the video data can be enriched, more users can pay attention to the video recording function, the interest of the recorded video is improved, and the user attention of the video recording function of the terminal is improved.

In the MPEG-4 file, the trak data block is a data block of the target video file dedicated to storing video data, and therefore, regardless of whether the player has the capability of playing transparent video, the player can acquire the image from the trak data block and play the image based on the image. And the udta data block is used for storing user-defined information of a user on the target video file in the target video file, and in a player with the capability of playing transparent video, logic for reading the udta data block and transparently displaying the image based on the image effect data of the udta data block is defined in advance in playing logic, so that the player can play the transparent video based on the image in the trak data block and the image effect data of the udta data block. However, in a player without the capability of playing transparent video, the logic for reading the udta data block is undefined, so that the player does not acquire image effect data, and does not display the image effect data, thereby solving the problem of displaying the image effect data and the actual image simultaneously.

204. And when receiving the sending instruction, the terminal displays the video cover of the target video file in the conversation area of the current conversation interface.

In the embodiment of the present invention, when the terminal generates the target video file, taking a session scene of a social application as an example, the terminal may display a video cover of the target video file in a session area of a current session interface based on the sending instruction, send the target video file to at least one user participating in the session, and play the video file by the terminal of each user.

The sending instruction can be triggered by a user or automatically triggered by the terminal when the recording is finished. Specifically, the terminal may display a sending option and a canceling option on the current session interface; the user may trigger the send instruction by clicking the send option, or the send instruction may be triggered when the recording of the target video file is completed.

When the terminal plays, the terminal can acquire image data of an image of a target graphic from the first type data block, acquire image effect data of the image from the second type data block, display a target graphic identifier in the image on a current session interface, and enable an area corresponding to an area indicated by the image effect data in the image to be in a transparent display state.

In one possible design, when the target video file contains voice data, the terminal displays a video cover of the target video file in a session area of a current session interface and displays a voice message box on the video cover when receiving a transmission instruction.

In the embodiment of the invention, in a multi-user session scene, before the terminal sends the target video file, a preview option can be displayed in the session interface, when a user clicks the preview option, a preview instruction is triggered, and when the terminal receives the preview instruction, the terminal plays the target video file on the current session interface.

In the embodiment of the invention, when a video is recorded, a terminal acquires the image data of a plurality of frames of images of a target graphic identifier and the corresponding image effect data, and respectively encapsulates the image data and the image effect data of the images in the data blocks of different types based on the data block types read by players with different capabilities to obtain a target video file, so that different players can only acquire the target image data matched with the playing capabilities of the players, and based on the separately stored recording process, the situation that two images, one is an image of a video picture and the other is an image generated based on the image effect data, can be simultaneously displayed by the players incapable of playing a transparent video is avoided, and the target video file can be compatibly played by different players.

Fig. 5 is a flowchart of a video playing method according to an embodiment of the present invention. The execution subject of the embodiment of the present invention is a terminal, and referring to fig. 5, the method includes:

501. and the terminal receives a video playing instruction.

In the embodiment of the present invention, the video playing instruction is used for playing a target video file. The video playing instruction can be triggered by a receiving event of a terminal receiving a target video file, and can also be triggered by the clicking operation of a user on the target video file. Accordingly, this step can be implemented in the following two ways.

In a first manner, when the video playing instruction is triggered by a user, the step may be: and the terminal receives a video playing instruction triggered by a user.

In the embodiment of the invention, the terminal can display the play button on the video cover of the target video file, and when a user wants to browse a certain video file, the user can click the play button. And when the terminal detects the click operation, receiving a video playing instruction triggered by a user.

In a second manner, when the video playing instruction is triggered by a receiving event, the step may be: and when the terminal detects a receiving event for receiving the target video file, triggering a video playing instruction of the target video file.

502. And the terminal acquires target image data corresponding to the capability information from the image data of the image and the image effect data of the image contained in the target video file according to the capability information of the player.

Wherein the capability information is used to indicate whether the player has the capability of playing transparent video, and different capability information corresponds to different types of data of the image. The target video file comprises image data of a plurality of frames of images and image effect data corresponding to each frame of image. The terminal may obtain the capability information from the configuration file corresponding to the play function.

Based on the difference of the playing capability indicated by the capability information, this step can be implemented in the following two ways.

In a first mode, when the capability information indicates that the player does not have the capability of playing the transparent video, the terminal only obtains the image data of the image contained in the target video file from the image data of the image contained in the target video file and the image effect data of the image.

In the embodiment of the invention, the target video file comprises image data of an image and image effect data of the image, and the image data and the image effect data of the image are respectively encapsulated in different types of data blocks of the target video file. And the image effect data of the image is encapsulated in the second type data block of the target video file. Then this step may be: and the terminal acquires the image data of the image contained in the target video file from the first type data block of the target video file, and takes the image data of the image as the target image data.

The target video file is obtained by encapsulating a plurality of different types of data blocks, the first type of data block can be a trak data block of the target video file, the first type of data block stores an image of the target video file, and players with different playing capabilities can read the data in the first type of data block, so that a player without the capability of playing a transparent video can also obtain the image from the first type of data block for playing. The second type data block may be a udta data block of the target video file, which is not read by a player that cannot play transparent video.

The terminal acquires the image data of the image from the first type data block, and actually is a technical process for reading the image data of the image. When the actual technology is implemented, the terminal decapsulates the target video file, extracts the first type data block from the decapsulated target video file, and reads image data in the first type data block.

It should be noted that, when the player cannot play the transparent video, because the player cannot read the second type data block, the player can only acquire the image data of the image, but not the image effect data of the image, the player cannot display the image effect data and the image in a split screen manner from the play source, and multiple players can play the target video file in a compatible manner, thereby expanding the application range of the target video file.

In a second mode, when the capability information indicates that the player has the capability of playing the transparent video, the terminal acquires image data of an image contained in the target video file and image effect data of the image from the image data of the image contained in the target video file and the image effect data of the image.

In the embodiment of the invention, a terminal acquires image data of an image contained in a target video file from a first type data block of the target video file; and the terminal acquires the image effect data of the image from the second type data block of the target video file, and takes the image data of the image and the image effect data of the image as target image data. Wherein the second type data block is udta data block of the target video file. The udta data block is generally a data block which can store user-defined information of the target video file by a user in the target video file, and a player capable of playing transparent video can read the udta data block.

In the practical technical implementation, the terminal decapsulates the target video file, extracts the first type data block and the second type data block from the decapsulated target video file, and reads the image data in the first type data block and the image effect data corresponding to the image data in the second type data block.

It should be noted that, the target video file may further store production parameters of the image data of the image and the image effect data of the image, such as a sampling rate, a resolution, and/or a timestamp, and the production parameters are the same between the image and the image effect data corresponding to the image. When the production parameter includes the timestamp, the terminal may search for image effect data corresponding to the image based on the timestamp in the production parameter. Specifically, the search process may be: the terminal acquires image data of an image and a time stamp of the image from the first type data block, wherein the image data and the time stamp of the image are contained in the target video file, image effect data which is the same as the time stamp of the image is searched from multi-frame image effect data according to the time stamp, and the image effect data is used as image effect data corresponding to the image.

In fact, in the first type data block and the second type data block, the image data of the image and the image effect data of the image effect data may be stored frame by frame, and the production parameters of each frame of image may be stored in the byte position corresponding to the frame of image in the first type data block. For each frame of image, the terminal can acquire the production parameters of the frame of image from the byte position corresponding to the frame of image. Similarly, the storage and acquisition mode of the production parameters of the image effect data is consistent with that of the image, and the details are not repeated here.

503. And the terminal plays the video based on the target image data.

In the embodiment of the invention, the terminal determines the display area corresponding to the target video file, and renders the multi-frame image in the display area frame by frame according to the image data of the multi-frame image in the target video file.

The target image data acquired in step 502 is different due to different capability information. Taking a multi-person conversation scene as an example, the step can be implemented in the following two ways.

In the first mode, when the capability information indicates that the player does not have the capability of playing transparent videos, images contained in the target video file are displayed on the current session interface.

In this step, in a multi-person session scene, the terminal acquires a display area of the target video file corresponding to the current session interface, and renders each frame of image in the target video file in the display area of the current session interface according to the pixel value of each pixel point in the image.

It should be noted that, when the player cannot play the transparent video, the player only displays the multi-frame image in the target video file, and does not display both the image and the mask image corresponding to the image effect data of the image in the same display screen, so that the target video file can be compatibly played on players with different capabilities.

And in the second mode, when the capability information indicates that the player has the capability of playing the transparent video, displaying the target graphic identifier in the image on the current session interface, wherein the area corresponding to the area indicated by the image effect data in the image is in a transparent display state.

In this step, the terminal reads image data of an image from the first-class data block, reads image effect data of the frame of image from the second-class data block, and renders a pixel value of each pixel point included in a target graphic identifier in the image in a display area of a current session interface according to a transparent area indicated by the image effect data. Wherein, the terminal may use the same image effect data as the time stamp of each frame image as the image effect data of the frame image according to the time stamp of the frame image.

The terminal may divide the image into a transparent area and a non-transparent area based on an area having the same transparency value in the image effect data. For the transparent area, the terminal sets the transparency of each pixel point in the area to be transparent according to the designated transparency value; for the non-transparent area, the terminal renders the target graphic identifier of the area in the display area according to the pixel value of the area.

In one possible design, the image effect data may also indicate an edge region between the transparent region and the non-transparent region, and the transparency value of the edge region in the image effect data may be a transparency value between the transparent region and the non-transparent region. And rendering the edge area in the display area by the terminal based on the process of the same principle, and setting the transparency of the pixel point of the edge area between the transparent area and the non-transparent area according to the transparency value corresponding to the edge area.

The designated transparency value of the transparent region may be 0, the transparency value of the non-transparent region may be 1, 0.9, 0.92, etc., and the transparency value corresponding to the edge region is a transition value distributed between the designated transparency value and the transparency value corresponding to the non-transparent region, for example, 0.1, 0.05, 0.09, 0.2, etc.

As shown in fig. 6, in a session scenario, a terminal generally plays a video through a playing plug-in of an application program, and if the application program used by a user participating in the session is not updated, and the playing plug-in of the application program does not have the capability of playing a transparent video, the terminal of the user performs normal playing in the first manner. If the application program used by the user participating in the session is updated, the playing plug-in of the application program has the capability of playing the transparent video, and the terminal of the user can play the transparent video in the second mode. Of course, when the terminal plays the video using the player that cannot play the transparent video, the terminal also performs the normal play in the first manner.

In a possible design, the target video file may further include voice data, and when the terminal plays the image in the target video file, the terminal may also synchronously play the voice data corresponding to the image. The voice data can be packaged in a first type data block of the target video file, the specified byte position of the first number block can also store the time stamp of each piece of voice data, and the terminal can acquire a piece of voice data which is the same as the time stamp based on the time stamp of a currently played frame of image and play the piece of voice data within the display duration of the image.

According to the method provided by the embodiment of the invention, when the target video file is played, the target image data corresponding to the capability information of the player is acquired from the image contained in the target video file and the image effect data of the image, so that the player incapable of playing the transparent video cannot acquire the image effect data from the playing source. For a player which does not have transparent video playing capability, two images cannot be displayed simultaneously, one is an image of a video picture, and the other is an image generated based on image effect data, and video playing is performed only based on the target image data, so that a target video file can be played compatibly in different players.

In order to clearly express the video recording and playing process, the following description takes the example of recording and playing based on the session of the social application as an example. Fig. 7 is a flowchart of a video recording and playing method according to an embodiment of the present invention. Referring to fig. 7, the method includes:

701. the first terminal displays a session interface.

702. When the first terminal detects the triggering operation of a recording button on the session interface, a recording preview area is displayed on the session interface, and the recording preview area comprises a plurality of candidate emoticons.

703. And when the first terminal detects the triggering operation of any one of the candidate emoticons, displaying the emoticon in the recording preview area.

When the selected emoticon is displayed in the recording preview area, the display size of the emoticon can be enlarged, so that the display size of the emoticon is larger than the display sizes of the candidate emoticons.

The emoticon is a specific representation form of the target graphic identifier in the above embodiment, and the embodiment of the present invention is described by taking the emoticon as an example, but the target graphic identifier is not specifically limited. The manner of acquiring the target expression identifier by the first terminal is the same as that in the above embodiment, and details are not repeated here.

As shown in fig. 8, when the record button on the session interface is triggered, the first terminal displays a plurality of graphic marks such as a penguin mark, a little bear mark, a smiling face mark, and the like. When the penguin mark is selected, the first terminal displays the penguin mark in a preset area in an enlarged mode.

704. The first terminal collects expression change information of a user in real time and collects voice data of the user in real time through a microphone.

As shown in fig. 9, the first terminal starts to collect behavior change information and voice data of the user, and displays the recording time under the penguin mark.

705. The first terminal synchronizes expression change information acquired in real time to the emoticon to obtain image data of the multi-frame image of the emoticon.

The multi-frame image of the emoticon is used for simulating the expression change of the user. During the recording process, displaying the multi-frame images of the emoticons in a recording preview area of the session interface so that the user can observe the changes of the emoticons.

706. And the first terminal acquires the image effect data of the multi-frame image of the emoticon.

The manner of acquiring the image effect data of the image by the first terminal is the same as that in the above embodiment, and details are not repeated here.

707. And when receiving the recording completion instruction, the first terminal generates a target video file in an MPEG-4 format based on the image data of the multi-frame image and the image effect data of the multi-frame image.

708. And when receiving the sending instruction, the first terminal displays the video cover of the target video file in the conversation area of the current conversation interface and sends the target video file to at least one user participating in the conversation.

As shown in fig. 10, when the first terminal completes recording, a sending option and a cancel option may be displayed, and when it is detected that the sending option is triggered and the first terminal receives sending triggered by the sending option, the first terminal sends the target video file to at least one user participating in a session, and in the current session interface, the first terminal displays a video cover in the target video file in a display area of a session message. The video cover is the target expression mark.

709. The second terminal displays the video cover of the target video file in a session area of a session interface of the second terminal.

The second terminal is a terminal to which any user participating in the session logs in.

710. And the second terminal receives the target video file in the MPEG-4 format when receiving a video playing instruction for the video cover.

711. And the second terminal acquires the image data of the multi-frame image and the image effect data of the image from the target video file.

Of course, in at least one user participating in the session, when the terminal of some other user cannot play the transparent video, the terminal of the user may obtain image data of multiple frames of images from the image data of the image contained in the target video file and the image effect data of the image.

712. And the second terminal plays the video based on the target image data.

As shown in fig. 11a, as the first terminal of the sender, the first terminal may display an emoticon in the image on the current conversation interface, and reveal the conversation background of the current conversation interface in the background area of the emoticon. The change of the specific display can be more obviously seen by combining the effect comparison schematic diagram shown in fig. 11 b.

For a second terminal that cannot play transparent video, as shown in the left diagram of fig. 12, the second terminal displays each frame of image in the target video file in an opaque manner, that is, displays the emoticon and a background area around the emoticon. In the second terminal capable of playing the transparent video, as shown in the right diagram of fig. 12, the second terminal may display the emoticon in the image in the display area of the conversation message, and reveal the conversation background of the current conversation interface in the background area of the emoticon. The difference in the specific display effect can be more clearly seen by combining the effect comparison schematic diagram shown in fig. 13.

It should be noted that the specific implementation of steps 701-712 is the same as the above embodiment, and is not described herein again.

The above process is described by taking as an example that the first terminal participating in the session in the social application performs video recording and the second terminal performs video playing, and in an actual scene, the first terminal may perform video playing after recording, and the specific process is the same as that of the second terminal, and the embodiment of the present invention is not specifically described. In addition, after the recording and playing method is applied, for a player without transparent video playing capability, two images cannot be displayed simultaneously, one is an image of a video picture, and the other is an image generated based on image effect data, but video playing is performed only based on the target image data, so that a target video file can be played compatibly in different players, display in a session interface is prevented from being disordered, and terminals with any playing capability can obtain similar video experience.

Fig. 14 is a schematic structural diagram of a video playing apparatus according to an embodiment of the present invention. Referring to fig. 14, the apparatus includes: a receiving module 1401, an obtaining module 1402 and a playing module 1403.

A receiving module 1401, configured to receive a video playing instruction, where the video playing instruction is used to play a target video file;

an obtaining module 1402, configured to obtain, according to the capability information of the player, target image data corresponding to the capability information from image data of an image included in the target video file and image effect data of the image, where the image effect data of each frame of image is used to indicate a transparent area of the corresponding image when playing;

a playing module 1403, configured to play a video based on the target image data;

Optionally, the obtaining module 1402 is configured to, when the capability information indicates that the player does not have a capability of playing a transparent video, obtain only image data of an image included in the target video file from image data of the image included in the target video file and image effect data of the image; alternatively, the first and second electrodes may be,

the obtaining module 1402 is configured to obtain image data of an image included in the target video file and image effect data of the image from the image data of the image included in the target video file and the image effect data of the image when the capability information indicates that the player has a capability of playing a transparent video.

Optionally, the obtaining module 1402 is configured to obtain, from the first type data block of the target video file, image data of an image included in the target video file when the capability information indicates that the player does not have a capability of playing a transparent video; or when the capability information indicates that the player has the capability of playing transparent videos, acquiring image data of an image contained in the target video file from a first type data block of the target video file, and acquiring image effect data of the image from a second type data block of the target video file, wherein the first type data block and the second type data block are different in type.

Optionally, the first type data block is a trak data block of the target video file.

Optionally, the second type data block is udta data block of the target video file.

Optionally, the playing module 1403 is configured to display the image included in the target video file on the current session interface when the capability information indicates that the player does not have a capability of playing the transparent video; or when the capability information indicates that the player has the capability of playing the transparent video, displaying the target graphic identifier in the image on the current session interface, wherein the region in the image corresponding to the region indicated by the image effect data is in a transparent display state.

Fig. 15 is a schematic structural diagram of a video recording apparatus according to an embodiment of the present invention. Referring to fig. 15, the apparatus includes: an acquisition module 1501 and a generation module 1502.

The acquiring module 1501 is configured to acquire image data of multiple frames of images of a target graphic identifier in real time when receiving a video recording instruction;

the obtaining module 1501 is further configured to obtain image effect data of the multiple frames of images, where each frame of image effect data is used to indicate a transparent area of the corresponding image when playing;

a generating module 1502, configured to generate, when a recording completion instruction is received, a target video file based on image data of the multiple frames of images and image effect data of the multiple frames of images, where a first type data block in the target video file stores the image data of the multiple frames of images, and a second type data block in the target video file stores the image effect data of the multiple frames of images;

Optionally, the apparatus further comprises:

and the first display module is used for displaying the video cover of the target video file in the conversation area of the current conversation interface when the sending instruction is received.

Optionally, the apparatus further comprises:

the acquisition module is used for acquiring voice data in real time in the process of acquiring the image data of the multi-frame image of the target graphic identifier in real time;

and the storage module is used for storing the voice data in the first type data block of the target video file when the target video file is generated.

Optionally, the apparatus further comprises:

and the determining module is used for performing voice recognition on the voice data, determining the state information of the current behavior of the target user according to the recognition result, and adding a graphic special effect corresponding to the state information on the image of the target graphic identifier according to the state information of the current behavior.

Optionally, the apparatus further comprises:

and the second display module is used for playing the target video file in the current session interface and displaying the voice message frame at the preset position of the target video file when the sending instruction is received.

Optionally, the obtaining module 1501 includes:

the acquiring unit is used for acquiring behavior change information of a target user in real time;

and the generating unit is used for synchronizing the behavior change information to the target graph identifier in real time and generating the image data of the multi-frame image of the target graph identifier.

Optionally, the generating unit is configured to synchronize the behavior change information to the three-dimensional model of the target graphic identifier to simulate the behavior change of the target user; and acquiring the image data of the multi-frame two-dimensional image of the three-dimensional model based on the synchronization process of the three-dimensional model identified by the target graph.

Optionally, the generating unit is configured to map the three-dimensional image of the three-dimensional model into a two-dimensional plane every preset time interval during the synchronization process, so as to obtain image data of one frame of two-dimensional image.

Optionally, the behavior change information includes expression change information and/or limb movement change information of the target user.

Optionally, the obtaining module 1501 is further configured to shoot a target user to obtain image data of multiple frames of portrait images of the target user; and for the image data of each frame of portrait image, directly taking the image data of the portrait image as the image data of one frame of image of the target graphic identifier, or adding a face graphic in the portrait image to the target graphic identifier to generate the image data of a plurality of frames of images of the target graphic identifier.

Optionally, the obtaining module 1501 is configured to perform image recognition on a target graphic identifier in each frame of image to obtain a transparent area and a non-transparent area, where the transparent area is an area occupied by an image background in the image, and the non-transparent area is an area occupied by the target graphic identifier in the image; and generating the image effect data according to the designated transparency value corresponding to the transparent area and the transparency value corresponding to the non-transparent area.

Optionally, the image data of the image and the image effect data of the image have the same production parameters, and the production parameters include sampling frequency, resolution and/or time stamp.

Optionally, the process of acquiring image data of multiple frames of images in real time is executed through a rendering link, and the process of generating the target video file is executed through a recording link.

Optionally, the packaging format of the target video file is MPEG-4 format, the first type data block is a trak data block of the target video file, and the second type data block is a udta data block of the target video file.

In the embodiment of the invention, when a video is recorded, a terminal acquires the image data of a plurality of frames of images of a target graphic identifier and the corresponding image effect data, and respectively encapsulates the image and the image effect data in the data blocks of different types based on the data block types read by players with different capabilities to obtain a target video file, so that different players can only acquire the target image data matched with the playing capabilities of the players, and based on the separately stored recording process, the situation that two images, one is an image of a video picture and the other is an image generated based on the image effect data, can be simultaneously displayed by the players incapable of playing a transparent video is avoided, and the target video file can be compatibly played by different players.

All the above optional technical solutions may be combined arbitrarily to form the optional embodiments of the present disclosure, and are not described herein again.

It should be noted that: in the video recording apparatus and the video playing apparatus provided in the above embodiments, only the division of the above functional modules is used for illustration when recording and playing videos, and in practical applications, the above function distribution may be completed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules to complete all or part of the above described functions. In addition, the video recording device and the video playing device provided by the above embodiments belong to the same concept as the embodiments of the video recording method and the video playing method, and specific implementation processes thereof are described in detail in the embodiments of the methods and are not described herein again.

Fig. 16 is a schematic structural diagram of a terminal according to an embodiment of the present invention. The terminal 1600 may be: fig. 16 shows a block diagram of a terminal 1600 according to an exemplary embodiment of the present invention. The terminal 1600 may be: a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III, motion video Experts compression standard Audio Layer 3), an MP4 player (Moving Picture Experts Group Audio Layer IV, motion video Experts compression standard Audio Layer 4), a notebook computer, or a desktop computer. Terminal 1600 may also be referred to by other names such as user equipment, portable terminal, laptop terminal, desktop terminal, etc.

Generally, terminal 1600 includes: a processor 1601, and a memory 1602.

Processor 1601 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so on. The processor 1601 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). Processor 1601 may also include a main processor and a coprocessor, where the main processor is a processor for Processing data in an awake state, and is also referred to as a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 1601 may be integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content that the display screen needs to display. In some embodiments, the processor 1601 may further include an AI (Artificial Intelligence) processor for processing computing operations related to machine learning.

Memory 1602 may include one or more computer-readable storage media, which may be non-transitory. The memory 1602 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 1602 is used to store at least one instruction for execution by processor 1601 to implement a video playback, video recording method provided by method embodiments of the present application.

In some embodiments, the terminal 1600 may also optionally include: peripheral interface 1603 and at least one peripheral. Processor 1601, memory 1602 and peripheral interface 1603 may be connected by buses or signal lines. Various peripheral devices may be connected to peripheral interface 1603 via buses, signal lines, or circuit boards. Specifically, the peripheral device includes: at least one of a radio frequency circuit 1604, a touch screen display 1605, a camera 1606, audio circuitry 1607, a positioning component 1608, and a power supply 1609.

Peripheral interface 1603 can be used to connect at least one I/O (Input/Output) related peripheral to processor 1601 and memory 1602. In some embodiments, processor 1601, memory 1602, and peripheral interface 1603 are integrated on the same chip or circuit board; in some other embodiments, any one or two of the processor 1601, the memory 1602 and the peripheral device interface 1603 may be implemented on a separate chip or circuit board, which is not limited by this embodiment.

The Radio Frequency circuit 1604 is used for receiving and transmitting RF (Radio Frequency) signals, also called electromagnetic signals. The radio frequency circuitry 1604 communicates with communication networks and other communication devices via electromagnetic signals. The rf circuit 1604 converts the electrical signal into an electromagnetic signal to be transmitted, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 1604 includes: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and so forth. The radio frequency circuit 1604 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocols include, but are not limited to: metropolitan area networks, various generation mobile communication networks (2G, 3G, 4G, and 5G), Wireless local area networks, and/or WiFi (Wireless Fidelity) networks. In some embodiments, the rf circuit 1604 may also include NFC (Near Field Communication) related circuits, which are not limited in this application.

The display 1605 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display screen 1605 is a touch display screen, the display screen 1605 also has the ability to capture touch signals on or over the surface of the display screen 1605. The touch signal may be input to the processor 1601 as a control signal for processing. At this point, the display 1605 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, the display 1605 may be one, providing the front panel of the terminal 1600; in other embodiments, the display screens 1605 can be at least two, respectively disposed on different surfaces of the terminal 1600 or in a folded design; in still other embodiments, display 1605 can be a flexible display disposed on a curved surface or a folded surface of terminal 1600. Even further, the display 1605 may be arranged in a non-rectangular irregular pattern, i.e., a shaped screen. The Display 1605 may be made of LCD (Liquid Crystal Display), OLED (Organic Light-Emitting Diode), or other materials.

The camera assembly 1606 is used to capture images or video. Optionally, camera assembly 1606 includes a front camera and a rear camera. Generally, a front camera is disposed at a front panel of the terminal, and a rear camera is disposed at a rear surface of the terminal. In some embodiments, the number of the rear cameras is at least two, and each rear camera is any one of a main camera, a depth-of-field camera, a wide-angle camera and a telephoto camera, so that the main camera and the depth-of-field camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize panoramic shooting and VR (Virtual Reality) shooting functions or other fusion shooting functions. In some embodiments, camera assembly 1606 can also include a flash. The flash lamp can be a monochrome temperature flash lamp or a bicolor temperature flash lamp. The double-color-temperature flash lamp is a combination of a warm-light flash lamp and a cold-light flash lamp, and can be used for light compensation at different color temperatures.

The audio circuitry 1607 may include a microphone and a speaker. The microphone is used for collecting sound waves of a user and the environment, converting the sound waves into electric signals, and inputting the electric signals to the processor 1601 for processing or inputting the electric signals to the radio frequency circuit 1604 to achieve voice communication. For stereo sound acquisition or noise reduction purposes, the microphones may be multiple and disposed at different locations of terminal 1600. The microphone may also be an array microphone or an omni-directional pick-up microphone. The speaker is used to convert electrical signals from the processor 1601 or the radio frequency circuit 1604 into sound waves. The loudspeaker can be a traditional film loudspeaker or a piezoelectric ceramic loudspeaker. When the speaker is a piezoelectric ceramic speaker, the speaker can be used for purposes such as converting an electric signal into a sound wave audible to a human being, or converting an electric signal into a sound wave inaudible to a human being to measure a distance. In some embodiments, the audio circuit 1607 may also include a headphone jack.

The positioning component 1608 is configured to locate a current geographic Location of the terminal 1600 for purposes of navigation or LBS (Location Based Service). The Positioning component 1608 may be a Positioning component based on the united states GPS (Global Positioning System), the chinese beidou System, the russian graves System, or the european union galileo System.

Power supply 1609 is used to provide power to the various components of terminal 1600. Power supply 1609 may be alternating current, direct current, disposable or rechargeable. When power supply 1609 includes a rechargeable battery, the rechargeable battery may support wired or wireless charging. The rechargeable battery may also be used to support fast charge technology.

In some embodiments, terminal 1600 also includes one or more sensors 1610. The one or more sensors 1610 include, but are not limited to: acceleration sensor 1611, gyro sensor 1612, pressure sensor 1613, fingerprint sensor 1614, optical sensor 1615, and proximity sensor 1616.

Acceleration sensor 1611 may detect acceleration in three coordinate axes of a coordinate system established with terminal 1600. For example, the acceleration sensor 1611 may be used to detect components of the gravitational acceleration in three coordinate axes. The processor 1601 may control the touch display screen 1605 to display the user interface in a landscape view or a portrait view according to the gravitational acceleration signal collected by the acceleration sensor 1611. The acceleration sensor 1611 may also be used for acquisition of motion data of a game or a user.

Gyroscope sensor 1612 can detect the organism direction and the turned angle of terminal 1600, and gyroscope sensor 1612 can gather the 3D action of user to terminal 1600 with acceleration sensor 1611 in coordination. From the data collected by the gyro sensor 1612, the processor 1601 may perform the following functions: motion sensing (such as changing the UI according to a user's tilting operation), image stabilization at the time of photographing, game control, and inertial navigation.

Pressure sensors 1613 may be disposed on a side bezel of terminal 1600 and/or underlying touch display 1605. When the pressure sensor 1613 is disposed on the side frame of the terminal 1600, a user's holding signal of the terminal 1600 can be detected, and the processor 1601 performs left-right hand recognition or shortcut operation according to the holding signal collected by the pressure sensor 1613. When the pressure sensor 1613 is disposed at the lower layer of the touch display 1605, the processor 1601 controls the operability control on the UI interface according to the pressure operation of the user on the touch display 1605. The operability control comprises at least one of a button control, a scroll bar control, an icon control and a menu control.

The fingerprint sensor 1614 is configured to collect a fingerprint of the user, and the processor 1601 is configured to identify the user based on the fingerprint collected by the fingerprint sensor 1614, or the fingerprint sensor 1614 is configured to identify the user based on the collected fingerprint. Upon recognizing that the user's identity is a trusted identity, the processor 1601 authorizes the user to perform relevant sensitive operations including unlocking a screen, viewing encrypted information, downloading software, paying for and changing settings, etc. The fingerprint sensor 1614 may be disposed on the front, back, or side of the terminal 1600. When a physical key or vendor Logo is provided on the terminal 1600, the fingerprint sensor 1614 may be integrated with the physical key or vendor Logo.

The optical sensor 1615 is used to collect ambient light intensity. In one embodiment, the processor 1601 may control the display brightness of the touch display screen 1605 based on the ambient light intensity collected by the optical sensor 1615. Specifically, when the ambient light intensity is high, the display brightness of the touch display screen 1605 is increased; when the ambient light intensity is low, the display brightness of the touch display 1605 is turned down. In another embodiment, the processor 1601 may also dynamically adjust the shooting parameters of the camera assembly 1606 based on the ambient light intensity collected by the optical sensor 1615.

A proximity sensor 1616, also referred to as a distance sensor, is typically disposed on the front panel of terminal 1600. The proximity sensor 1616 is used to collect the distance between the user and the front surface of the terminal 1600. In one embodiment, the processor 1601 controls the touch display 1605 to switch from the light screen state to the rest screen state when the proximity sensor 1616 detects that the distance between the user and the front surface of the terminal 1600 is gradually decreased; when the proximity sensor 1616 detects that the distance between the user and the front surface of the terminal 1600 is gradually increased, the touch display 1605 is controlled by the processor 1601 to switch from the breath screen state to the bright screen state.

Those skilled in the art will appreciate that the configuration shown in fig. 16 is not intended to be limiting of terminal 1600, and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components may be employed.

In an exemplary embodiment, a computer-readable storage medium, such as a memory, including instructions executable by a processor in a terminal to perform a video playing method or a video recording method in the following embodiments is also provided. For example, the computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A video playback method, the method comprising:

when the capability information of the player indicates that the player does not have the capability of playing the transparent video, acquiring image data of an image contained in the target video file from a first type data block of the target video file;

when the capability information indicates that the player has the capability of playing transparent videos, acquiring image data of images contained in the target video file from the first type data block of the target video file, and acquiring image effect data of the images from the second type data block of the target video file, wherein the image effect data of each frame of image is used for indicating a transparent area of a corresponding image during playing;

playing the video based on the acquired data;

the first type data blocks can be read by a player with the capability of playing transparent videos and can also be read by a player without the capability of playing transparent videos, and the second type data blocks can be read by a player with the capability of playing transparent videos but cannot be read by a player without the capability of playing transparent videos.

2. The method of claim 1, wherein the first type of data blocks are trak data blocks of the target video file, and wherein the second type of data blocks are udta data blocks of the target video file.

3. The method of claim 1, wherein the playing video based on the obtained data comprises:

when the capability information indicates that the player does not have the capability of playing transparent videos, displaying images contained in the target video file on a current session interface; or the like, or, alternatively,

and when the capability information indicates that the player has the capability of playing the transparent video, displaying the target graphic identifier in the image on the current session interface, wherein the region in the image corresponding to the region indicated by the image effect data is in a transparent display state.

4. A method for video recording, the method comprising:

the first type data block can be read by a player with the capability of playing transparent videos and can also be read by a player without the capability of playing transparent videos, the second type data block can be read by a player with the capability of playing transparent videos but cannot be read by a player without the capability of playing transparent videos, and the player is used for playing the target video file according to the read data.

5. The method of claim 4, wherein after generating the target video file, the method further comprises:

and when a sending instruction is received, displaying a video cover of the target video file in a conversation area of the current conversation interface.

6. The method of claim 4, further comprising:

acquiring voice data in real time in the process of acquiring image data of a plurality of frames of images of a target graphic identifier in real time;

and when the target video file is generated, storing the voice data in a first type data block of the target video file.

7. The method of claim 4, wherein the obtaining image data of a plurality of frames of images of the target graphic identifier in real time comprises:

acquiring behavior change information of a target user in real time, synchronizing the behavior change information to the target graphic identifier in real time, and generating image data of a plurality of frames of images of the target graphic identifier; or the like, or, alternatively,

shooting a target user to obtain image data of a plurality of frames of portrait images of the target user; and for the image data of each frame of portrait image, directly taking the image data of the portrait image as the image data of one frame of image of the target graphic identifier, or adding a face graphic in the portrait image to the target graphic identifier to generate the image data of the multi-frame image of the target graphic identifier.

8. The method of claim 4, wherein the obtaining image effect data for the plurality of frames of images comprises:

for each frame of image, carrying out image recognition on a target graphic identifier in the image to obtain a transparent area and a non-transparent area, wherein the transparent area is an area occupied by an image background in the image, and the non-transparent area is an area occupied by the target graphic identifier in the image;

and generating the image effect data according to the designated transparency value corresponding to the transparent area and the transparency value corresponding to the non-transparent area.

9. The method of claim 4, wherein the packaging format of the target video file is MPEG-4 format, the first type of data blocks are trak data blocks of the target video file, and the second type of data blocks are udta data blocks of the target video file.

10. A video playback apparatus, comprising:

the acquisition module is used for acquiring image data of an image contained in the target video file from a first type data block of the target video file when the capability information of the player indicates that the player does not have the capability of playing the transparent video; when the capability information indicates that the player has the capability of playing transparent videos, acquiring image data of images contained in the target video file from the first type data block of the target video file, and acquiring image effect data of the images from the second type data block of the target video file, wherein the image effect data of each frame of image is used for indicating a transparent area of a corresponding image during playing;

the playing module is used for playing videos based on the acquired data;

11. A video recording apparatus, characterized in that the apparatus comprises:

12. A terminal, characterized in that the terminal comprises a processor and a memory, wherein at least one computer program is stored in the memory, and the computer program is loaded by the processor and executed to implement the operations performed by the video playing method according to any one of claims 1 to 3 or the operations performed by the video recording method according to any one of claims 4 to 9.

13. A computer-readable storage medium, having at least one computer program stored therein, where the computer program is loaded into and executed by a processor to perform operations performed by the video playback method of any one of claims 1 to 3 or the video recording method of any one of claims 4 to 9.