CN115589450B

CN115589450B - Video recording method and device

Info

Publication number: CN115589450B
Application number: CN202211066834.3A
Authority: CN
Inventors: 肖瑶; 杨毅轩; 林晨
Original assignee: Honor Device Co Ltd
Current assignee: Honor Device Co Ltd
Priority date: 2022-09-01
Filing date: 2022-09-01
Publication date: 2024-04-05
Anticipated expiration: 2042-09-01
Also published as: CN115589450A

Abstract

A video recording method and device are applied to the technical field of electronics. The method comprises the following steps: after receiving the first instruction, respectively obtaining M x T video frames and N x T audio frames before the current moment to synthesize, ensuring that the first video frame of the M video frames of each time unit is an I frame, and synthesizing by utilizing the M x T video frames and the N x T audio frames before the current moment to obtain a first video, wherein the first video can be ensured to be an audio-video synchronous video. In addition, compared with the method for ensuring audio and video synchronization by comparing time stamps one by one, the video recording method is simpler in operation, can quickly generate a video for recording, and is better in user experience.

Description

Video recording method and device

Technical Field

The present application relates to the field of electronics, and in particular, to a method and apparatus for video recording.

Background

With the continuous development of electronic devices, the electronic devices are increasingly applied to life of people, such as intelligent display devices, smart phones, computers, and the like. At present, electronic devices have the function of playing video and recording video back. When video is recorded back, the audio data and the video data can be synthesized to obtain a recorded video file. However, the current means for recording back video has the problem that the audio and video pictures are not synchronous, which affects the user experience.

Disclosure of Invention

In view of this, the present application provides a method, apparatus, computer readable storage medium and computer program product for video recording, which can solve the problem of asynchronous audio and video, and greatly improve the user experience.

In a first aspect, a method for video recording is provided, where the method is applied to an electronic device, and the method includes:

receiving a first instruction, wherein the first instruction is used for triggering generation of video;

respectively acquiring video frames of a first time length and audio frames of the first time length in response to the first instruction, wherein the first time length is composed of T time units, T is an integer greater than or equal to 1, the video frames of the first time length comprise M x T video frames, the first video frame of the M video frames in each time unit is a key frame, M represents the number of video frames in each time unit, M is an integer greater than or equal to 1, the audio frames of the first time length comprise N x T audio frames, N represents the number of audio frames in each time unit, and N is an integer greater than or equal to 1;

and synthesizing the video frames with the first time length and the audio frames with the first time length to obtain a first video with synchronous audio and video.

The above-described method may be performed by an electronic device (such as a smart display device) or a chip in an electronic device (such as a chip in a smart display device). Based on the above scheme, after receiving the first instruction, m×t video frames and n×t audio frames before the current moment are respectively acquired and synthesized, and the first video frame of the M video frames of each time unit is ensured to be an I frame, and the m×t video frames and n×t audio frames before the current moment are utilized to synthesize, so as to obtain a first video, and the first video can be ensured to be a video with synchronous audio and video. Sound and picture synchronization is the synchronization of sound and picture. In addition, compared with the method for ensuring audio and video synchronization by comparing time stamps one by one, the video recording method is simpler in operation, can quickly generate a video for recording, and is better in user experience.

In one possible implementation, the T-th video frame of the m×t video frames ₁ Marking a time position corresponding to an mth video frame in a time unit through a first mark, and t ₁ ∈[1,T]，m∈[2,M]The method comprises the steps of carrying out a first treatment on the surface of the Wherein the first identifier is used for representing that the mth video frame is a data frame with coding failure;

the synthesizing the video frame with the first duration and the audio frame with the first duration comprises the following steps:

Summing the first W video frames of the mth video frame and the last W video frames of the mth video frame based on the first identification, taking an average frame, and taking the average frame as an updated mth video frame; and synthesizing based on the updated mth video frame.

Illustratively, in synthesizing with M x T video frames and N x T audio frames, if a special marking object (such as the T-th marked by a first identification ₁ An mth video frame of a time unit), then W data objects before and after the mth video frame may be used to average frames for filling and then synthesized. Filling is performed by an algorithm for averaging frames, so that the synthesized video pictures can be guaranteed to be coherent.

In one possible implementation, the T-th audio frame of the N-th audio frame ₂ The corresponding time position of the nth audio frame of the time units is marked by a second mark, t ₂ ∈[1,T]，n∈[1,N]The second identifier is used for characterizing the nth ₂ The audio frames are data frames that fail encoding;

skipping the nth audio frame for synthesis based on the second identification, the skipping the nth audio frame comprising: and removing the audio frames except the nth audio frame from the N audio frames.

"skip" is understood to mean that the t-th is not used in the audio-video encapsulation ₂ The nth audio frame of the time unit is synthesized, and the time position of the second identification mark is directly skipped for synthesis.

In one possible implementation, before receiving the first instruction, the method further includes:

respectively caching video frames of a second time length and audio frames of the second time length, wherein the second time length consists of A time units, A is an integer greater than or equal to 1, wherein the video frames of the second time length comprise M x A video frames, the first video frame of the M video frames in each time unit is a key frame, M represents the number of the video frames in each time unit, M is an integer greater than or equal to 1, the audio frames of the second time length comprise N x A audio frames, N represents the number of the audio frames in each time unit, N is an integer greater than or equal to 1, and 'x' represents multiplication;

the obtaining the video frame with the first duration and the audio frame with the first duration includes:

and sequentially acquiring M x T video frames before a first moment from the M x A video frames, and sequentially acquiring N x T audio frames before the first moment from the N x A audio frames, wherein the first moment is the moment when the first instruction is received.

When the electronic device is playing video or the user watches video through the electronic device, the electronic device can respectively start the audio and video data recording buffer to prepare for the subsequent synthesis of the video (such as the first video) at the highlight moment. Thus, when the first instruction is received, m×t video frames before the current time and n×t audio frames before the current time may be read from the buffer.

In one possible implementation, M is greater than or equal to m×t, and N is greater than or equal to n×t. When the first instruction is received, the data cached in the electronic device can generate video of a first duration.

In one possible implementation, when m×a is less than m×t, the m×t video frames are the m×a video frames; when n×a is smaller than n×t, the n×t audio frames are the n×a audio frames. When the first instruction is received, the data buffered in the electronic device is not yet sufficient to generate video for the first duration. In this case, the video may be generated based on the data already in the cache.

In one possible implementation manner, the first duration is a preset duration (or a preset recording duration), where an end time of the first duration is a time when the first instruction is received.

In one possible implementation manner, the electronic device is an intelligent display device, the first instruction is an instruction triggered by a user through a controller, and the controller establishes communication connection with the intelligent display device.

In a second aspect, there is provided an apparatus for video recording comprising means for performing any of the methods of the first aspect. The device can be an intelligent display device or a chip in the intelligent display device. The device comprises an input unit, a display unit and a processing unit.

When the apparatus is a terminal, the processing unit may be a processor, the input unit may be a communication interface, and the display unit may be a graphic processing module and a screen; the terminal may further comprise a memory for storing computer program code which, when executed by the processor, causes the terminal to perform any of the methods of the first aspect.

When the device is a chip in the terminal, the processing unit may be a logic processing unit in the chip, the input unit may be an output interface, a pin, a circuit, or the like, and the display unit may be a graphics processing unit in the chip; the chip may also include memory, which may be memory within the chip (e.g., registers, caches, etc.), or memory external to the chip (e.g., read-only memory, random access memory, etc.); the memory is for storing computer program code which, when executed by the processor, causes the chip to perform any of the methods of the first aspect.

In a third aspect, there is provided a computer readable storage medium storing computer program code which, when run by an apparatus for video recording, causes the apparatus to perform any one of the methods of the first aspect.

In a fourth aspect, there is provided a computer program product comprising: computer program code which, when run by a video-recorded apparatus, causes the apparatus to perform any of the methods of the first aspect.

Drawings

FIG. 1 is an exemplary diagram of an application scenario of an embodiment of the present application;

FIG. 2 is a schematic diagram of a hardware system suitable for use with the electronic device of the present application;

FIG. 3 is a schematic diagram of a software system suitable for use with the electronic device of the present application;

FIG. 4 is a schematic flow chart of a method of video recording according to an embodiment of the present application;

FIG. 5 is an exemplary diagram of a video cache array according to an embodiment of the present application;

FIG. 6 is an exemplary diagram of an audio cache array according to an embodiment of the present application;

fig. 7 is a schematic block diagram of an apparatus for video recording according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be described below with reference to the accompanying drawings.

The video recording method provided by the embodiment of the application can be applied to the electronic equipment with the video recording function. For example, the electronic device may be an electronic device such as a smart display device, a smart television, a smart screen, a tablet, a notebook, a cell phone, a wearable device, a multimedia playing device, an electronic book reader, a personal computer, an ultra-mobile personal computer (UMPC), a Personal Digital Assistant (PDA), a netbook, an enhanced display (AR) device, a Virtual Reality (VR) device, a projector, or the like. The present application is not limited to a specific form of electronic device. The embodiment of the application does not limit the specific type of the electronic device.

Taking the electronic equipment as an intelligent electricity as an example, the intelligent television can be connected with other electronic equipment in a communication way, or can be connected with a remote controller.

The term "remote control" as referred to in the embodiments of the present application refers to a component of an electronic device that can typically be controlled wirelessly over a relatively short distance. Typically, the electronic device is connected to the electronic device using infrared and/or Radio Frequency (RF) signals and/or bluetooth, and may also include functional modules such as WiFi, wireless USB, bluetooth, motion sensors, etc. For example: the hand-held touch remote controller replaces most of the physical built-in hard keys in a general remote control device with a touch screen user interface.

The following describes an example of an electronic device as a smart display device in connection with fig. 1.

Fig. 1 shows an exemplary diagram of an application scenario of an embodiment of the present application. As shown in fig. 1, the smart display device 101 may be in communication with a remote control 102.

Communication between the remote controller 102 and the smart display device 101 includes infrared protocol communication or bluetooth protocol communication, other short-range communication modes, and the like, and the smart display device 101 is controlled by wireless or other wired modes. The user may control the smart display device 101 by inputting user instructions through keys on a remote control, voice input, control panel input, etc. Such as: the user can input corresponding control instructions through volume up-down keys, channel control keys, up/down/left/right movement keys, voice input keys, menu keys, on-off keys and the like on the remote controller, so as to realize the function of controlling the intelligent display device 101. In some embodiments, a key for setting "Wonderful time" is provided in the remote controller 102, and the user presses the key to trigger the intelligent display device 101 to generate the video in the background.

It should be understood that only remote control 102 is shown in fig. 1 to control smart display device 101. In fact, the remote control 102 in fig. 1 may be replaced by other control devices.

In some embodiments, mobile terminals, tablet computers, notebook computers, and other smart devices may also be used to control the smart display device 101. For example, the smart display device 101 is controlled using an application running on the smart device. The application program, by configuration, can provide various controls to the user in an intuitive User Interface (UI) on a screen associated with the smart device.

In some embodiments, the mobile phone may install a software application with the smart display device 101, and implement connection communication through a network communication protocol, so as to achieve the purpose of one-to-one control operation and data communication. Such as: the control instruction protocol can be established between the mobile phone and the intelligent display device 101, the remote control keyboard is synchronized to the mobile phone, and the function of controlling the intelligent display device 101 is realized by controlling the user interface on the mobile phone. The audio and video content displayed on the mobile phone can be transmitted to the intelligent display device 101, so that the synchronous display function is realized.

It should be understood that the scenario in fig. 1 is only a schematic illustration of one application scenario of the present application, which is not limiting to the embodiment of the present application, and the present application is not limited thereto.

As also shown in fig. 1, the smart display device 101 is also in data communication with a server through a variety of communication means. The smart display device 101 may be permitted to make communication connections through a local area network (local area network, LAN), a wireless local area network (wireless local area network, WLAN) and other networks. The server may provide various content and interactions to the smart display device 101. For example, the smart display device 101 receives software program updates, or accesses a remotely stored digital media library by sending and receiving information, as well as electronic program guide (electrical program guide, EPG) interactions. The server may be a cluster, or may be multiple clusters, and may include one or more types of servers. Other web service content such as video on demand and advertising services are provided by the server.

The intelligent display device 101 may be a liquid crystal display, an organic light-emitting diode (OLED) display, or a projection display device. The embodiment of the present application does not limit the type, size, resolution, etc. of the smart display device 101. It will be appreciated that the smart display device 101 may make some changes in performance and configuration as desired.

The smart display device 101 may additionally provide a smart network tv function of a computer support function, including, but not limited to, a network tv, a smart tv, an internet protocol tv (internet protocol television, IPTV), etc., in addition to the broadcast receiving tv function.

The following describes a hardware system and a software architecture applicable to the embodiment of the present application with reference to fig. 2 and 3.

Fig. 2 shows a hardware system suitable for the electronic device of the present application. As shown in fig. 2, the electronic device 200 may include a processor 210, a memory 220, a wireless communication module 230, a display 240, a power module 250, an audio module 260, and a sensor module 270. Wherein the audio module 260 includes a microphone 261 and a speaker 262. The sensor module 270 includes a pressure sensor 271 and a touch sensor 272.

The configuration shown in fig. 2 does not constitute a specific limitation on the electronic apparatus 200. In other embodiments of the present application, the electronic device 200 may include more or fewer components than those shown in FIG. 2, or the electronic device 200 may include a combination of some of the components shown in FIG. 2, or the electronic device 200 may include sub-components of some of the components shown in FIG. 2. The components shown in fig. 2 may be implemented in hardware, software, or a combination of software and hardware.

Processor 210 is operative to read and execute computer readable instructions. Processor 210 may include one or more processing units. For example, the processor 210 may include at least one of the following processing units: application processors (application processor, AP), modem processors, graphics processors (graphics processing unit, GPU), image signal processors (image signal processor, ISP), controllers, video codecs, digital signal processors (digital signal processor, DSP), baseband processors, neural-Network Processors (NPU). The different processing units may be separate devices or integrated devices.

In some embodiments, processor 210 may include a controller, an operator, and registers. The controller is mainly responsible for instruction decoding and sending out control signals for operations corresponding to the instructions. The arithmetic unit is mainly responsible for storing register operands, intermediate operation results and the like temporarily stored in the instruction execution process. In a specific implementation, the hardware architecture of the processor 210 may be an Application Specific Integrated Circuit (ASIC) architecture, a MIPS architecture, an ARM architecture, an NP architecture, or the like.

In some embodiments, the processor 210 may be configured to parse signals or instructions received by the wireless communication module 230. For example, when the electronic device 200 is the television 101 shown in fig. 1, the electronic device 200 may receive an instruction transmitted from a remote controller.

In some embodiments, the processor 110 is configured to receive a first instruction, the first instruction configured to trigger generation of a video; respectively acquiring video frames of a first time length and audio frames of the first time length in response to the first instruction, wherein the first time length is composed of T time units, T is an integer greater than or equal to 1, the video frames of the first time length comprise M x T video frames, the first video frame of the M video frames in each time unit is a key frame, M represents the number of video frames in each time unit, M is an integer greater than or equal to 1, the audio frames of the first time length comprise N x T audio frames, N represents the number of audio frames in each time unit, and N is an integer greater than or equal to 1; and synthesizing the video frames with the first time length and the audio frames with the first time length to obtain a first video with synchronous audio and video.

Memory 220 is coupled to processor 210 for storing various software programs and/or sets of instructions. In some embodiments, memory 220 may include high-speed random access memory, and may also include non-volatile memory, such as one or more disk storage devices, flash memory devices, or other non-volatile solid state storage devices. Memory 220 may store an operating system such as an embedded operating system like uos, vxWorks, RTLinux, etc.

In some embodiments, memory 220 may store a plurality of data objects. For example, the memory 220 may buffer a video frames, a being an integer greater than or equal to 2. For another example, the memory 220 may buffer B audio frames, B being an integer greater than or equal to 2.

The wireless communication module 230 may provide solutions for wireless communication including wireless local area network WLAN (e.g., wireless fidelity (wireless fidelity, wi Fi) network), bluetooth (BT), frequency modulation (frequency modulation, FM), near field wireless communication technology (near field communication, NFC), infrared technology (IR), etc., as applied to the electronic device 200. The wireless communication module 230 may be one or more devices that integrate at least one communication processing module. The wireless communication module 230 receives the electromagnetic wave, modulates the electromagnetic wave signal, filters the electromagnetic wave signal, and transmits the processed signal to the processor 210.

In some embodiments, the electronic device 200 may implement network connection with other electronic devices through the wireless communication module 230, for example, implement connection with a local area network of the other electronic devices, so as to implement functions of screen projection, multi-screen collaboration, and the like. For another example, the electronic device 200 may establish a connection with a remote control through the wireless communication module 230.

The display 240 includes a display screen. The display screen may be used to display images or video. The display screen includes a display panel. The display panel may employ a liquid crystal display (liquid crystal display, LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode (AMOLED), a flexible light-emitting diode (FLED), a Mini light-emitting diode (Mini LED), a Micro light-emitting diode (Micro LED), a Micro OLED (Micro OLED), or a quantum dot LED (quantum dot light emitting diodes, QLED).

The electronic device 200 may implement display functions through a GPU, a display screen 240, and an application processor. The GPU is a microprocessor for image processing, and is connected to the display screen 240 and the application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. Processor 210 may include one or more GPUs that execute program instructions to generate or change display information.

The electronic device 200 may implement audio functions through an audio module 260, a speaker 261, a microphone 262, an application processor, and the like. Such as playing audio files, etc.

It should be understood that the connection relationships between the modules shown in fig. 2 are only illustrative, and do not constitute a limitation on the connection relationships between the modules of the electronic device 200. Alternatively, the modules of the electronic device 200 may also use a combination of the various connection manners in the foregoing embodiments.

The pressure sensor 271 is used for sensing a pressure signal, and can convert the pressure signal into an electrical signal. In some embodiments, the pressure sensor 271 may be provided to the display 240. The pressure sensor 271 may be of various kinds, and may be, for example, a resistive pressure sensor, an inductive pressure sensor, or a capacitive pressure sensor. The capacitive pressure sensor may be a device comprising at least two parallel plates with conductive material, and when a force is applied to the pressure sensor 271, the capacitance between the electrodes changes, and the electronics 200 determine the strength of the pressure based on the change in capacitance. When a touch operation acts on the display 240, the electronic apparatus 200 detects the touch operation according to the pressure sensor 271. The electronic device 200 may also calculate the position of the touch from the detection signal of the pressure sensor 271.

The touch sensor 272 is also referred to as a touch device. The touch sensor 272 may be disposed on the display 240, and a touch screen, also referred to as a touch screen, is formed by the touch sensor 272 and the display 240. The touch sensor 272 is used to detect a touch operation acting thereon or thereabout. The touch sensor 272 may communicate the detected touch operation to the application processor to determine the type of touch event. Visual output related to touch operations may be provided through the display 240. In other embodiments, the touch sensor 272 may also be disposed on a surface of the electronic device 100 and at a different location than the display 240.

Optionally, the electronic device 200 may also include keys (not shown in fig. 2). The keys include a power-on key and an audio key. The keys may be mechanical keys or touch keys. The electronic device 200 may receive a key input signal and implement a function associated with the key input signal.

The hardware system of the electronic device 200 is described in detail above, and the software system of the electronic device 200 is described below. The software system may employ a layered architecture, an event driven architecture, a microkernel architecture, a micro-service architecture, or a cloud architecture, and the embodiments of the present application illustratively describe the software system of the electronic device 200.

As shown in fig. 3, the software system using the layered architecture is divided into several layers, each of which has a clear role and division. The layers communicate with each other through a software interface. In some embodiments, the software system may be divided into four layers, from top to bottom, an application layer, an application framework layer, a HAL layer & kernel layer, and a driver layer, respectively.

The application layer may include video APP, television service (TvServiceTv), screen APP (e.g., wonderful) TV).

It should be understood that fig. 3 only shows a part of an application program of the electronic device, and embodiments of the present application are not limited thereto. For example, applications such as gallery, calendar, WLAN, bluetooth, music, etc. may also be installed in the electronic device.

The application framework layer provides an application programming interface (application programming interface, API) and programming framework for application programs of the application layer. The application framework layer may include some predefined functions.

For example, the framework layer includes an audio manager (audioplayer), a video player (MediaPlayer), and a display system (SurfaceFlinger). It should be understood that the chinese terms herein with respect to term AudioFlinger, mediaPlayer and surfeflinger are merely for the convenience of the reader and are not limiting on the embodiments of the present application. In the android architecture, the specific meaning of the term AudioFlinger, mediaPlayer and surfeflinger is understood by those skilled in the art.

AudioFlinger is used to assist in the execution of audio policies, management of audio streaming devices, transmission of data, and the like.

MediaPlayer is used to assist in the management and transmission of video data.

SurfaceFlinger is used to manage data related to an interface (such as a control included in the interface, etc.). For example, surfeflinger may synthesize graphics display data.

The application layer and the application framework layer run in a virtual machine. The virtual machine executes java files of the application program layer and the application program framework layer as binary files. The virtual machine is used for executing functions such as management of object life cycle, stack management, thread management, security and exception management, garbage collection and the like.

HAL layer & kernel layer is the layer between hardware and software. As shown in fig. 3, the HAL layer & kernel layer includes: an Audio Output (AO) module, a Video Output (VO) module, an on-screen display (OSD) module, a mixing Module (MIX), an AudioCapture module, and a screen capture module.

The driving layer may include driving of various communication interfaces, such as high definition multimedia interface (high definition multimedia interface, HDMI)/digital television (digital television, DTV)/Audio Video (AV) driving.

The roles of the individual modules are described in connection with the following example and the architecture in fig. 3.

For example, for a scene of playing video, when the video APP in the application layer plays video, audio data is transmitted to an Audio Output (AO) module of the HAL layer through the audioplayer in the frame layer; video data is transmitted to a Video Output (VO) module of the HAL layer through a media player in the framework layer; an interface (e.g., a progress bar, etc.) is transmitted to an on-screen display (OSD) via a SurfaceFlinger in the framework layer.

Illustratively, for scenes playing live pictures, the live signal is decoded to AO and VO in the HAL layer by HDMI/DTV/AV in the drive layer.

The two scenes transmit a display signal when playing video. The display signal is transmitted to TvServiceTv in the application layer. In addition, a transmission end signal is transmitted when the video playback is ended. TvServiceTv listens for both signals. When TvServiceTv receives the send-display signal, tvServiceTv sends a send-display notification to WonderfulTv and can pull up the screen service (screen recordservice) of WonderfulTv (which can be replaced by other screen APP). The recording service is used for starting recording audio and video. Specifically, wonderfulTv obtains audio data (i.e., AO data) and video data (i.e., VO data) by calling the interfaces of the screen and AudioCapture after TvServiceTv encapsulation. Correspondingly, the HAL layer acquires audio data in the category of AudioCapture, and acquires video data in the category of screen capture. AudioCapture is used to acquire AO data. WonderfullTv places (or saves) AO data into an audio buffer array, e.g., N data per second. The screen capture is used to obtain VO data. The screen capture is used to obtain the blending data (MIX) of VO and OSD. WonderfulTv places (or saves) the mix into a video buffer array, e.g., M frames per second. When the WonderfullTv is used for video synthesis in the follow-up process, VO data can be obtained from the video cache array, and AO data can be obtained from the audio cache array.

It will be appreciated that the foregoing is merely described by way of example in terms of a screen recording application as WonderfulTv, and embodiments of the present application are not limited thereto. In fact, wonderfullTv may be replaced with other recording applications.

Illustratively, when the user triggers a highlight moment through the remote controller, the WonderfulTv receives a trigger broadcast (or a trigger instruction, which may correspond to the first instruction below), and invokes the RecordManager interface recordmovement. Video data is first written to an audio video composition tool (MediaMuxer) from the video cache array at the current write location (e.g., the location is written with M _i Denoted) beginning at the M-th position (T is the recording time, and corresponds to the first duration hereinafter), to M _i Ending the position, and writing in the MediaMuxer; and also writes audio data to MediaMuxer, where the audio data is currently written in an audio buffer array (e.g., the location is N _i Denoted) as starting point, write MediaMuxer to N _i The position ends. And after the data writing is finished, generating an MP4 file, and storing the MP4 file in a memory.

In some embodiments, wonderfullTv invokes Audio Capture in the HAL layer to obtain audio data and buffers the audio data into an audio buffer array, e.g., N audio frames per second.

In some embodiments, wonderfullTv invokes Screen capture in the HAL layer to capture video data and buffers the video data into a video buffer array, e.g., capturing M video frames per second buffer.

In some embodiments, the WonderfulTv obtains n×t audio frames from the audio buffer array and m×t video frames from the video buffer array, and encapsulates the n×t audio frames and the m×t video frames to generate a video file (such as a first video).

It should be understood that the above illustrates the block diagram of the electronic device based on fig. 2, and the software architecture of the embodiment of the present application is illustrated by fig. 3, but the embodiment of the present application is not limited thereto.

A method of video recording according to an embodiment of the present application is described below with reference to fig. 4 to 6. It will be appreciated that the method of video recording shown below may be implemented in an electronic device having the above-described hardware structure (e.g., the television 101 shown in fig. 1 or the electronic device shown in fig. 2).

Fig. 4 shows a schematic flow chart of a method 400 of video recording according to an embodiment of the present application. The method is applied to the electronic equipment. As shown in fig. 4, the method 400 includes:

step 401, a first instruction is received, where the first instruction is used to trigger generation of a video.

The first instruction may be understood as an instruction triggering the video recording. The first instruction may also have other names, such as a recording instruction, a highlight instruction, a recording video instruction, a recording instruction, etc., which are not limited in this embodiment.

The embodiment of the application does not limit the specific form of the first instruction. In some embodiments, the electronic device is a smart display device (such as a smart screen), and the first instruction is an instruction triggered by a user through a remote control. As shown in fig. 1, the first instruction is triggered by the user via the remote control 102.

For example, the remote control may set a "Wonderful time" key, which the user presses, i.e. triggers the electronic device to generate the video for the recording.

It should be understood that the description is given only by taking the example that the user triggers the first instruction through the remote controller, and the embodiments of the present application are not limited thereto.

In some embodiments, the duration of the video playback may depend on a predetermined playback duration, such as a first duration. In other words, after receiving the video recording instruction (corresponding to the first instruction), according to the time position of the current video recording instruction, the first m×t video frames and the first n×t audio frames of the current time position are obtained from the buffer, so as to generate the first-time video.

Step 402, in response to the first instruction, obtaining a video frame of a first duration and an audio frame of the first duration, where the first duration is composed of T time units, T is an integer greater than or equal to 1, where the video frame of the first duration includes m×t video frames, and a first video frame of the M video frames in each time unit is a key frame, M represents a number of video frames in each time unit, M is an integer greater than or equal to 1, and the audio frame of the first duration includes n×t audio frames, N represents a number of audio frames in each time unit, and N is an integer greater than or equal to 1.

It should be understood that the units of measure of time units in the embodiments of the present application are not particularly limited. For example, the time units may be seconds, milliseconds, microseconds, and so on. The plurality of time units may constitute a first time length. For example, the time unit is seconds, the value of T is 15, and then the first duration is 15 seconds.

"the video frames of the first duration include m×t video frames" means that: there are M x T data objects (or video frames) in the video frames of the first duration. Where there are M data objects in each time unit. For example, the time unit is seconds, the value of T is 15, the value of m is 3, and then the video frames of the first duration are 45 video frames.

In addition, for m×t video frames, the type of the first video frame of the M video frames in each time unit is an I frame. The I-frames represent key frames, which may also be referred to as intra-coded frames. It can be understood that in the field of video encoding and decoding, an I frame can be understood as a complete reservation of a picture of the frame, and encoding and decoding can be performed independently, that is, only the frame data is needed to complete the decoding, and the decoding process does not need to depend on the front and rear video frames. The aim of the arrangement is to ensure that the decoded pictures are coherent and prevent the synthesized video from appearing a splash screen picture.

"the audio frames of the first duration include n×t audio frames" means: there are N x T data objects (or audio frames) in the audio frames of the first duration. Where there are N data objects in each time unit. For example, the unit of time is seconds, the value of T is 15, the value of n is 2, and then the audio frame of the first duration (15 seconds) is 30 audio frames.

Alternatively, the value of M may not be equal to the value of N. In general, the number of buffered video frames and audio frames per time unit is different.

The method for acquiring the video frame of the first duration and the audio frame of the first duration in the embodiment of the present application is not particularly limited. The video frames of the first duration and the audio frames of the first duration may be retrieved from a buffer.

Optionally, before receiving the first instruction, the method 400 further includes:

step 402-1, respectively buffering a video frame of a second duration and an audio frame of the second duration, where the second duration is composed of a time units, a is an integer greater than or equal to 1, where the video frame of the second duration includes m×a video frames, and a first video frame of the M video frames in each time unit is a key frame, M represents a number of video frames in each time unit, M is an integer greater than or equal to 1, the audio frame of the second duration includes n×a audio frames, N represents a number of audio frames in each time unit, and N is an integer greater than or equal to 1.

Taking a second as an example of a time unit, when the electronic device is playing a video or a user watches a video through the electronic device, the electronic device can respectively start recording and buffering of audio and video data. Wherein, the audio data needs to ensure that N data objects (or data frames) are cached every second; video data needs to be guaranteed to buffer M data objects (or data frames) per second, and the first frame in the buffered M data objects per second is an I-frame. Thus, when the first instruction is received, m×t video frames before the current time and n×t audio frames before the current time may be read from the buffer.

Optionally, step 402 includes: and sequentially acquiring M x T video frames before a first moment from the M x A video frames, and sequentially acquiring N x T audio frames before the first moment from the N x A audio frames, wherein the first moment is the moment when the first instruction is received.

That is, when the first instruction is received, m×t video frames before the first time and n×t audio frames before the first time may be extracted from the buffer.

Optionally, the electronic device may maintain an array to buffer the data frames, and the amount of data that may be stored in the array may be preset. When the preset data volume is exceeded, the data can be automatically cleaned. Illustratively, the electronic device may maintain an array (e.g., 0-199 data) capable of storing 200 frames of data, and when the stored frames of data exceed 200, the data may be automatically overcrowded, cleaning up the previously buffered data. This has the advantage that it helps to reduce the occupation of memory space of the electronic device by the buffered data. It should be noted that the manner of maintaining the array with respect to the electronic device is applicable to both the audio buffer array and the video buffer array.

Optionally, the step of the electronic device buffering the audio-video data may be performed automatically in the background when the electronic device plays the video, in preparation for the subsequent composition of the video at the moment of the highlight (such as the first video). For example, as shown in fig. 3, when the electronic device plays a video, a screen recording service (screenrecording service) of WonderfulTv (or other screen recording APP) may be automatically triggered. The recording service may initiate recording of audio and video.

Step 403, synthesizing the video frame with the first duration and the audio frame with the first duration to obtain a first video with synchronous audio and video. Optionally, the duration of the first video of the audio-visual synchronization is the first duration.

"composition" may be understood as "encapsulation". For example, the first video with synchronized audio and video may be obtained by encapsulating m×t video frames and n×t audio frames. When playing the first video, playing M x T video frames and N x T audio frames simultaneously.

In the embodiment of the present application, when a first instruction is received, m×t video frames and n×t audio frames before a current moment are respectively acquired from a buffer, and it is ensured that a first video frame of M video frames of each time unit is an I frame, and the m×t video frames and n×t audio frames before the current moment are used for synthesis, so as to obtain a first video, and it is ensured that the first video is an audio-video synchronous video. Sound and picture synchronization is the synchronization of sound and picture. In addition, compared with the method for ensuring audio and video synchronization by comparing time stamps one by one, the video recording method is simpler in operation, capable of rapidly generating the first video and better in user experience.

Optionally, a second instruction is received, where the second instruction is used to play the first video.

The form of the second instruction is not particularly limited in the embodiment of the present application. For example, the second instruction may be triggered by the user via a remote control or other device capable of controlling the electronic apparatus.

For example, when the user triggers the second instruction, the first video may be played on the screen of the electronic device.

Because of the uncertainty in the time at which the first instruction is received, there may be enough data buffered in the electronic device to generate video of the first duration when the first instruction is received, and possibly insufficient data buffered in the electronic device to generate video of the first duration.

In a first case, when a first instruction is received, data cached in the electronic device can generate a video of a first duration.

Optionally, M is greater than or equal to m×t, and N is greater than or equal to n×t. Or, the second time period is greater than or equal to the first time period. In this case, the amount of data buffered in the electronic device is sufficient to generate video for the first duration.

For example, assuming a second duration of 60 seconds and a first duration of 20 seconds, 2 video frames per second are buffered, 3 audio frames per second are buffered, and when the electronic device is turned on to record audio and video in the background for 60 seconds, 2 x 60 video frames, and 3 x 60 audio frames are stored in the buffer. After the first instruction is currently received, 2 x 20 video frames are extracted forward from the point in time when the first instruction is currently received, and 3 x 20 audio frames are extracted forward from the point in time when the first instruction is currently received. And synthesizing based on the extracted 2 x 20 video frames and 3 x 20 audio frames to generate the audio-video synchronization first video.

In the second case, when the first instruction is received, the data buffered in the electronic device is insufficient to generate the video of the first duration. In this case, the video may be generated based on the data already in the cache.

Optionally, when m×a is less than m×t, the m×t video frames are the m×a video frames; when n×a is smaller than n×t, the n×t audio frames are the n×a audio frames. Or, the second time period is less than the first time period. In this case, the amount of data buffered in the electronic device is insufficient to generate the video for the first duration.

For example, assuming a second duration of 5 seconds, a first duration of 15 seconds, 2 video frames per second buffered, 3 audio frames per second buffered, 2*5 video frames in the buffer, and 3*5 audio frames are stored when the electronic device is turned on to record an audio-video buffer in the background for 5 seconds. After the first instruction is currently received, 2*5 video frames and 3*5 audio frames in the buffer are extracted, and synthesis is performed based on the extracted 2*5 video frames and 3*5 audio frames, so that an audio-video synchronization first video is generated. At this time, the duration of the first video is the second duration.

In summary, the duration of the first video generated by the electronic device may depend on how much data is cached in the electronic device.

In the data buffering, the embodiment of the application needs to ensure that M video frames are buffered per time unit (and the first video frame of the M video frames is an I frame), and that n×t audio frames are buffered per time unit. However, in the buffering process, due to the situation that packet loss may occur in video encoding or audio encoding, a data frame with encoding failure occurs, and at this time, the data frame in a certain time unit may not meet a certain data amount. For example, for video frames, the buffered video frames in a certain time unit do not satisfy M. For another example, for an audio frame, the buffered audio frames in a certain time unit do not satisfy N. For these cases, the present application proposes the following solutions.

Alternatively, as an implementationFor example, the T th video frame of the m×t video frames ₁ Marking a time position corresponding to an mth video frame in a time unit through a first mark, and t ₁ ∈[1,T]，m∈[2,M]The method comprises the steps of carrying out a first treatment on the surface of the Wherein the first identification is used to characterize that the mth video frame is a data frame that failed to be encoded.

The first identifier is used for representing the t < th ₁ The mth video frame in a time unit is a data frame of which encoding fails. A "coding failure" is understood to mean an unsuccessful coding of a video frame due to a loss of video data packets or other reasons, or at t ₁ The video frame is not generated at the time position corresponding to the mth video frame in the time unit.

Note that the mth video frame is the t ₁ The other frames of the first video frame are removed in time units. This is because the first frame data of each time unit is an I-frame and must be generated. If the generation of other data frames except the first video frame in each time unit fails, the data of the previous and the next frames can be relied on for generation, and the type of the generated data frames is P frames. Those skilled in the art will recognize that P frames differ from I frames in that P frames cannot be decoded independently, i.e., decoding of P frames requires reliance on the data of the preceding and following frames.

Exemplary, for T time units, if the T th time unit of T time units ₁ If the amount of data buffered in each time unit is insufficient, i.e. the amount of data in the video frame is less than M, then a special mark may be made on the time position (or the buffer index) where the amount of data is insufficient, for example, the first mark is used to mark the time position corresponding to the mth video frame. The purpose of the marking is that the video frames of the first identification mark can be specially processed in the subsequent composition. The special processing will be described below.

Optionally, synthesizing the video frame of the first duration and the audio frame of the first duration includes:

It should be understood that the number of W is not particularly limited in the embodiment of the present application. W may be an integer greater than or equal to 1.

It should also be understood that the description herein is made with respect to averaging frames for special processing, and embodiments of the present application are not limited thereto. Other special treatments may be performed by those skilled in the art. For example, only the first W video frames are used to average the frames. For another example, only the last W video frames are used to average the frames.

Described in connection with the example in fig. 5. Assuming that the time unit is seconds, the first duration is 15 seconds (i.e., t=15), and m=2, as shown in fig. 5, for a video frame, when the 60 th second receives the first instruction, the electronic device (such as a screen recording application in the electronic device) may extract, from the cache (such as a video cache array), a video frame 15 seconds before the 60 th second, that is, a video frame from 46 th to 60 th seconds shown in fig. 5. It will be appreciated that the electronic device may cache video data through a video cache array and identify each video frame based on a cache index. As can be seen from fig. 5, the 46 th to 60 th second video frames correspond to the buffer index 80 to the buffer index 119. Suppose 47 th second (corresponding to t ₁ Time units), the video frame 84 (identifiable by the buffer index 83) fails to be encoded, the video frame 84 may be marked by a first identification. When the first video is subsequently synthesized, the video frame 84 may be filled with an average of three (corresponding to W) video frames before and after the buffer index 83. For example, video frame 84= (video frame 81+video frame 82+video frame 83+video frame 85+video frame 86+video frame 87)/6.

It should be understood that fig. 5 is only described by taking the example that the video frame 84 is a special mark object, and the embodiment of the present application is not limited thereto. In fact, if a video frame has a plurality of special marker objects, it can be processed in the manner described above.

Optionally, as an embodiment, the T-th audio frame of the n×t audio frames ₂ The corresponding time position of the nth audio frame of the time units is marked by a second mark, t ₂ ∈[1,T]，n∈[1,N]The second identification is used to characterize that the nth audio frame is a data frame that failed encoding.

The second identifier is used for representing the t < th ₂ The nth audio frame of a time unit is a data frame of which encoding fails. A "coding failure" is understood to mean an unsuccessful coding of an audio frame due to a loss of an audio packet or other reason, or at t ₂ The audio frame is not generated at the time position corresponding to the nth video frame in the time unit.

Exemplary, for T time units, if the T th time unit of T time units ₂ If the amount of data buffered in each time unit is insufficient, i.e. the amount of data in the video frame is less than N, then the time position (or buffer index) of the insufficient amount of data may be specially marked, for example, the second identifier is used for the t-th ₂ The time position corresponding to the nth audio frame of the time unit is marked. The purpose of the marking here is that the audio frames of the second identification mark can be specially processed in the subsequent synthesis. The special processing will be described below.

"skip" is understood to mean that the t-th is not used in the audio-video encapsulation ₂ The nth audio frame of the time unit is synthesized, and the time position of the second identification mark is directly skipped for synthesis. Alternatively, a processing party other than "skipCan be used for the t ₂ The nth audio frame of a time unit is processed as a null data frame.

Illustratively, in synthesizing with M x T video frames and N x T audio frames, if a special marking object (such as the T-th marked by the second mark) is encountered ₂ An nth audio frame of a time unit), then the special marker object may be skipped and then synthesized.

Alternatively, as an implementation manner, the synthesis may also be performed by using a null data packet, that is, the nth audio frame is synthesized as null data.

Illustratively, in synthesizing with M x T video frames and N x T audio frames, if a special marking object (such as the T-th marked by the second mark) is encountered ₂ An nth audio frame of a time unit), then the special marker object is synthesized as a null data packet. Because the time of the empty data packet is relatively short, the video user obtained by the mode of synthesizing the empty data packet usually does not perceive that the time is stopped, and the watching experience of the user is not affected.

It should be understood that the examples of the present application are directed to t ₁ And t ₂ The magnitude relation of (2) is not particularly limited. The two may be the same or different.

Described in connection with the example in fig. 6. Assuming that the time unit is seconds, the first duration is 15 seconds (i.e., t=15), and n=3, as shown in fig. 6, when the 60 th second receives the first instruction, the electronic device (such as a screen recording application in the electronic device) may extract the audio frame 15 seconds before the 60 th second, that is, the audio frame from the 46 th second to the 60 th second shown in fig. 6, from the audio array buffer. It is to be appreciated that the electronic device can cache audio data through an audio array cache and identify each audio frame based on a cache index. As can be seen from fig. 6, the audio frames 46 th to 60 th seconds correspond to the video frames corresponding to the buffer index 135 to the buffer index 179. Suppose 47 th second (corresponding to t ₂ Time units), the audio frame 141 (identifiable by the buffer index 140) fails to be encoded, and the audio frame 141 may be marked by a second identification. When the first video is synthesized later, the method can To skip the audio frame 141.

It should be understood that fig. 6 is only described by taking the audio frame 141 as a specific mark object as an example, and the embodiment of the present application is not limited thereto. In fact, if an audio frame has a plurality of special marker objects, it can be processed in the manner described above.

It should also be understood that the above-described special processing manner for video frames and processing manner for audio frames may be implemented separately or in combination, which is not limited in this embodiment of the present application. For example, when the first video is synthesized, the video frames from 46 th to 60 th seconds in fig. 5 and the audio frames from 46 th to 60 th seconds in fig. 6 can be extracted and synthesized, so as to obtain the 15 second video with synchronous audio and video.

It should also be understood that the examples in fig. 5 and 6 are merely schemes that facilitate understanding of embodiments of the present application by those skilled in the art, and embodiments of the present application are not limited thereto.

Of course, when more than a certain number of data frames are in a certain time unit, the redundant data frames may be discarded. For example, when a video frame in a certain time unit exceeds M frames, the redundant video frame is discarded. For another example, when an audio frame in a certain time unit exceeds N frames, the redundant audio frame is discarded.

The method for recording video provided in the embodiment of the present application is described in detail above with reference to fig. 1 to 6. An embodiment of the device of the present application will be described in detail below in conjunction with fig. 7. It should be understood that the video recording apparatus according to the embodiments of the present application may perform the various video recording methods according to the embodiments of the present application, that is, the following specific working processes of various products may refer to the corresponding processes in the embodiments of the foregoing methods.

Fig. 7 is a schematic block diagram of an apparatus 700 for video recording in an embodiment of the present application. It should be appreciated that the apparatus 700 may perform the method of video recording shown in fig. 4-6.

As shown in fig. 7, the apparatus 700 for video recording includes: communication unit 710, acquisition unit 720, synthesis unit 730. Optionally, the apparatus 700 further comprises a display unit 740. In one possible example, the apparatus 700 may be a smart display device.

In one example, the communication unit 710 is configured to receive a first instruction, where the first instruction is configured to trigger generation of a video;

the obtaining unit 720 is configured to obtain, in response to the first instruction, a video frame of a first duration and an audio frame of a first duration, where the first duration is composed of T time units, T is an integer greater than or equal to 1, the video frame of the first duration includes m×t video frames, and a first video frame of the M video frames in each time unit is a key frame, M represents a number of video frames in each time unit, M is an integer greater than or equal to 1, the audio frame of the first duration includes n×t audio frames, N represents a number of audio frames in each time unit, N is an integer greater than or equal to 1, and "×" represents a multiplication operation;

The synthesizing unit 730 is configured to synthesize the video frame of the first duration and the audio frame of the first duration, so as to obtain a first video with synchronous audio and video.

Optionally, as an embodiment, the T-th video frame of the m×t video frames ₁ Marking a time position corresponding to an mth video frame in a time unit through a first mark, and t ₁ ∈[1,T]，m∈[2,M]The method comprises the steps of carrying out a first treatment on the surface of the Wherein the first identifier is used for representing that the mth video frame is a data frame with coding failure;

the synthesizing unit 730 is configured to synthesize the video frame of the first duration and the audio frame of the first duration, and specifically includes:

Optionally, as an embodiment, the T-th audio frame of the n×t audio frames ₂ The corresponding time position of the nth audio frame of the time units is marked by a second mark, t ₂ ∈[1,T]，n∈[1,N]The second identifier is used for characterizing the nth ₂ Personal audioThe frame is a data frame that fails to be encoded;

Optionally, as an embodiment, before receiving the first instruction, the obtaining unit 720 is further configured to:

the acquiring unit 720 is configured to acquire a video frame of a first duration and an audio frame of a first duration, and specifically includes:

Optionally, as an embodiment, m×a is greater than or equal to m×t, and n×a is greater than or equal to n×t.

Optionally, as an embodiment, when m×a is less than m×t, the m×t video frames are the m×a video frames; when n×a is smaller than n×t, the n×t audio frames are the n×a audio frames.

Optionally, as an embodiment, the first duration is a preset duration, where an end time of the first duration is a time of receiving the first instruction.

Optionally, as an embodiment, the apparatus 700 is a smart display device, and the first instruction is an instruction triggered by a user through a controller, where the controller establishes a communication connection with the smart display device.

In one possible example, the communication unit 710 may be implemented by a wireless communication module. The acquisition unit 720 and the synthesis unit 730 may be implemented by a processor or a processing unit. The display unit 740 may be implemented by a screen. It should be appreciated that the apparatus 1300 described above is embodied in the form of functional units. The term "unit" herein may be implemented in the form of software and/or hardware, to which the embodiments of the present application are not limited in particular.

For example, a "unit" may be a software program, a hardware circuit or a combination of both that implements the functions described above. The hardware circuitry may include (ASIC) application specific integrated circuits, electronic circuits, processors (e.g., shared, dedicated, or group processors, etc.) and memory that execute one or more software or firmware programs, integrated logic circuits, and/or other suitable devices that provide the above described functionality. In a simple embodiment, one skilled in the art will recognize that the apparatus 700 may take the form shown in FIG. 2.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

The present application also provides a computer program product which, when executed by a processor, implements the method of any of the method embodiments of the present application.

The computer program product may be stored in a memory and eventually converted to an executable object file that can be executed by a processor through preprocessing, compiling, assembling, and linking.

The present application also provides a computer readable storage medium having stored thereon a computer program which, when executed by a computer, implements a method according to any of the method embodiments of the present application. The computer program may be a high-level language program or an executable object program.

The computer readable storage medium may be volatile memory or nonvolatile memory, or may include both volatile memory and nonvolatile memory. The nonvolatile memory may be a read-only memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an electrically Erasable EPROM (EEPROM), or a flash memory. The volatile memory may be random access memory (random access memory, RAM) which acts as an external cache. By way of example, and not limitation, many forms of RAM are available, such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), enhanced SDRAM (ESDRAM), synchronous DRAM (SLDRAM), and direct memory bus RAM (DR RAM).

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, and are not repeated herein.

In the several embodiments provided in this application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk, or other various media capable of storing program codes.

It should be understood that, in various embodiments of the present application, the size of the sequence number of each process does not mean that the execution sequence of each process should be determined by its functions and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present application.

In addition, the terms "system" and "network" are often used interchangeably herein. The term "and/or" herein is merely one association relationship describing the associated object, meaning that there may be three relationships, e.g., a and/or B, may represent: a exists alone, A and B exist together, and B exists alone. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship. For example, A/B may represent A or B.

The terms (or numbers) of "first," "second," … and the like in the embodiments of the present application are used for descriptive purposes only and are not to be construed as indicating or implying any particular importance or number of features indicated or as defining a particular order or sequence unless otherwise indicated. Thus, features defining "first", "second", …, etc., may include one or more features, either explicitly or implicitly. In the description of the embodiments of the present application, "at least one (an item)" means one or more. The meaning of "plurality" is two or more. "at least one of (an) or the like" below means any combination of these items, including any combination of a single (an) or a plurality (an) of items.

For example, items appearing similar to "in embodiments of the present application include at least one of: the meaning of the expressions a, B, and C "generally means that the item may be any one of the following unless otherwise specified: a, A is as follows; b, a step of preparing a composite material; c, performing operation; a and B; a and C; b and C; a, B and C; a and A; a, A and A; a, A and B; a, a and C, a, B and B; a, C and C; b and B, B and C, C and C; c, C and C, and other combinations of a, B and C. The above is an optional entry for the item exemplified by 3 elements a, B and C, when expressed as "the item includes at least one of the following: a, B, … …, and X ", i.e. when there are more elements in the expression, then the entry to which the item is applicable can also be obtained according to the rules described above.

In summary, the foregoing description is only a preferred embodiment of the technical solution of the present application, and is not intended to limit the scope of the present application. Any modification, equivalent replacement, improvement, etc. made within the spirit and principles of the present application should be included in the protection scope of the present application.

Claims

1. A method of video recording, the method being applied to an electronic device, the method comprising:

Receiving a first instruction, wherein the first instruction is used for triggering generation of video, and the first instruction is an instruction for triggering video recording;

responding to the first instruction, respectively obtaining a video frame with a first time length and an audio frame with a first time length before a first time, wherein the first time length is composed of T time units, T is an integer greater than 1, the video frame with the first time length comprises M x T video frames, the first video frame of the M video frames in each time unit is a key frame, M represents the number of video frames in each time unit, M is an integer greater than or equal to 1, the audio frame with the first time length comprises N x T audio frames, N represents the number of audio frames in each time unit, N is an integer greater than or equal to 1, and 'x' represents the moment when the first instruction is received;

synthesizing the video frames with the first time length and the audio frames with the first time length to obtain a first video with synchronous audio and video;

before receiving the first instruction, the method further includes:

respectively caching video frames of a second time length and audio frames of the second time length, wherein the second time length is composed of A time units, A is an integer greater than 1, the video frames of the second time length comprise M x A video frames, and the audio frames of the second time length comprise N x A audio frames;

The method for obtaining the video frames and the audio frames of the first time length respectively comprises the following steps:

sequentially acquiring m×t video frames before a first time from the m×a video frames, and sequentially acquiring n×t audio frames before the first time from the n×a audio frames;

wherein, M is greater than or equal to M is T, and N is greater than or equal to N is T; or when m×a is smaller than m×t, the m×t video frames are the m×a video frames; when n×a is smaller than n×t, the n×t audio frames are the n×a audio frames.

2. The method of claim 1, wherein the T-th of the M x T video frames ₁ Marking a time position corresponding to an mth video frame in a time unit through a first mark, and t ₁ ∈[1,T]，m∈[2,M]The method comprises the steps of carrying out a first treatment on the surface of the Wherein the first identifier is used for representing that the mth video frame is a data frame with coding failure;

the synthesizing the video frame with the first duration and the audio frame with the first duration includes:

summing the first W video frames of the mth video frame and the last W video frames of the mth video frame based on the first identification, taking an average frame, and taking the average frame as an updated mth video frame; and synthesizing based on the updated mth video frame, wherein W is an integer greater than or equal to 1.

3. The method according to claim 1 or 2, characterized in that the T-th of the N-th audio frames ₂ The corresponding time position of the nth audio frame of the time units is marked by a second mark, t ₂ ∈[1,T]，n∈[1,N]The second identifier is used for characterizing the nth ₂ The audio frames are data frames that fail encoding;

skipping the nth audio frame for synthesis based on the second identification, the skipping the nth audio frame for synthesis comprising: removing the T-th audio frame from the N-T audio frames ₂ Audio frames other than the nth audio frame of the time unit are synthesized.

4. The method of claim 1, wherein M x a is greater than or equal to M x T and N x a is greater than or equal to N x T.

5. The method of claim 1, wherein when M x a is less than M x T, the M x T video frames are the M x a video frames; when n×a is smaller than n×t, the n×t audio frames are the n×a audio frames.

6. The method of claim 1, 2, 4, or 5, wherein the first duration is a preset duration, and wherein an end time of the first duration is a time at which the first instruction is received.

7. The method of claim 1, 2, 4 or 5, wherein the electronic device is a smart display device and the first instruction is an instruction triggered by a user via a controller, the controller establishing a communication connection with the smart display device.

8. An electronic device comprising a processor and a memory, the processor and the memory being coupled, the memory being for storing a computer program that, when executed by the processor, causes the electronic device to perform the method of any one of claims 1 to 7.

9. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program, which when executed by a processor causes the processor to perform the method of any of claims 1 to 7.