CN115604540A

CN115604540A - Video acquisition method and device

Info

Publication number: CN115604540A
Application number: CN202211065711.8A
Authority: CN
Inventors: 肖瑶; 杨毅轩; 林晨
Original assignee: Honor Device Co Ltd
Current assignee: Honor Device Co Ltd
Priority date: 2022-09-01
Filing date: 2022-09-01
Publication date: 2023-01-13
Anticipated expiration: 2042-09-01
Also published as: CN115604540B

Abstract

The application provides a video acquisition method and a video acquisition device, which comprise the following steps: playing the first video; receiving a back recording signal for a first video; responding to the back recording signal, and searching a target video data object with the distance from the last video data object as the back recording duration from the played video data; determining a first start identifier at least according to the identifier of the target video data object; starting with a video data object in the played video data pointed by the first start identifier, and acquiring video data from the played video data; determining an interval between the first start identifier and the identifier of the last video data object; searching for an audio data object spaced from the last audio data object in the played audio data, and taking the identifier of the searched audio data object as a second initial identifier; starting with the audio data object in the played audio data pointed by the second starting identifier, and acquiring audio data from the played audio data; a second video is generated.

Description

Video acquisition method and device

Technical Field

The present application relates to the field of multimedia technologies, and in particular, to a video acquisition method and apparatus.

Background

When an electronic device is used to run a video Application (APP) or a live APP, the electronic device may record video data and audio data, record a timestamp of the video data every time a piece of video data is recorded, and record a timestamp of the audio data every time an piece of audio data is recorded. The electronic equipment can call a recording service, extract video data and audio data from recorded video data and audio data, specifically, the electronic equipment can determine the start time of video data extraction first, compare the start time of the video data with the time stamp of each piece of audio data to determine the start time of the audio data, extract the video data according to the start time of the video data, extract the audio data according to the start time of the audio data, and synthesize a video based on the extracted video data and audio data. However, in this manner, it takes a long time for the electronic device to determine the start time of the audio data, and the video acquisition efficiency is lowered.

Disclosure of Invention

The application provides a video acquisition method and device, and aims to solve the problem that video acquisition efficiency is reduced due to timestamp comparison. In order to achieve the above object, the present application provides the following technical solutions:

in a first aspect, the present application provides a video acquisition method, including: playing a first video; in the process of playing the first video, receiving a recording signal aiming at the first video, wherein the recording signal is used for indicating that the played video data and the played audio data of the first video are utilized to generate a second video of the first video; responding to the back recording signal, and searching a target video data object with the distance from the last video data object in the played video data as the back recording duration from the played video data; determining a first start identifier at least according to the identifier of the target video data object; taking a video data object in the played video data pointed by the first start identifier as a first frame, and acquiring video data from the played video data from the first frame; determining an interval between the first start identifier and the identifier of the last video data object; searching an audio data object spaced from the last audio data object of the played audio data in the played audio data, and taking the identifier of the searched audio data object as a second initial identifier; taking an audio data object in the played audio data pointed by the second initial identification as a first segment of audio, and starting from the first segment of audio, acquiring audio data from the played audio data; a second video is generated, the second video including video data and audio data.

According to the video acquisition method, the electronic device can find the first starting identifier from the last video data object of the played video data forward, and after the interval from the starting identifier extracted by the video data to the identifier of the last video data object is determined, the second starting identifier is found by using the interval and the identifier of the last audio data object of the played audio data, specifically, the identifier of the audio data object before the last audio data object and different from the interval is determined as the second starting identifier. Compared with a timestamp comparison mode, the electronic equipment can directly utilize the interval and the identifier of the last audio data object of the played audio data to quickly find the first audio of the audio data through a video acquisition method, and the video acquisition efficiency is improved.

The time corresponding to the first start identifier is the start time of the video data, and the start time of the video data is the time of the first frame in the first video; the time corresponding to the second start identifier is the start time of the audio data, and the start time of the audio data is also the time of the first segment of audio in the first video. Where the second video is a sub-video of the first video (i.e. a segment of the first video), the first start identifier may correspond to a start identifier of the following video data, and the second start identifier may correspond to a start identifier of the following audio data.

In one possible implementation, determining the first start identifier based on at least the identifier of the target video data object includes: if the target video data object is an I frame, determining the identifier of the target video data object as a first starting identifier; if the target video data object is a B frame or a P frame, at least according to the target video data object, searching a first video data object with an I frame identifier corresponding to the target video data object from the played video data, and determining the identifier of the first video data object with the I frame identifier as a first starting identifier. The first video data object with the I frame identifier may be a video data object with the I frame identifier closest to the target video data object, and the closest may mean that a position difference between two video data objects is smaller than a position difference between the target video data object and another video data object, and the position difference may represent a position interval of the two video data objects in the first video, that is, there is no video data object with the I frame identifier between the first video data object with the I frame identifier and the target video data object.

The I frame can be decompressed into a picture through a video decompression algorithm, a P frame or a B frame is not referred in the decompression process of the I frame, and picture information is completely reserved in the I frame, so that the identifier of the video object with the identifier of the I frame is determined as a first starting identifier, which means that the first frame extracted by the electronic device is the I frame, the first picture can be accurately restored, and the decompression can be performed with reference to the I frame when the P frame or the B frame after the I frame is decompressed, so that the accuracy is improved.

In a possible implementation manner, if the target video data object is a B frame or a P frame, at least according to the target video data object, searching for a first video data object having an I frame identifier corresponding to the target video data object from the played video data includes: and if the target video data object is a B frame or a P frame, starting to search a first video data object with an I frame identifier corresponding to the target video data object from the target video data object to the last video data object. That is, starting with the target video data object, the first video data object with I-frame identification located after the target video data object, and the last video data object is the last video data object to be searched.

In one possible implementation manner, taking a video data object in the played video data pointed by the first start identifier as a first frame, and starting from the first frame, acquiring video data from the played video data includes: starting from the first frame to the end of the last video data object, and acquiring a video data object from played video data; and combining the acquired video data objects to obtain video data, wherein the time length of the video data is less than or equal to the recording time length. If the target video data object is an I frame, the target video data object is a first frame, the electronic device may extract video data objects (including the target video data object and a last video data object) from the target video data object to a last video data object, a duration of an interval between the target video data object and the last video data object is a playback duration, and a duration of the obtained video data is equal to the playback duration. As shown in fig. 7, the video data object cached in the location 50 is a target video data object, the location 50 is an identifier of the target video data object, but the video data object cached in the location 50 is a B frame or a P frame, a first video data object with an I frame identifier is searched from the location 50 backward, the video data object searched at the location 60 has an I frame identifier, the video data object cached in the location 60 is a first frame, the video data object is extracted from the location 60 to the location 500, and the duration of the video data is less than the recording back duration.

In a possible implementation manner, if the target video data object is a B frame or a P frame, at least according to the target video data object, searching a first video data object having an I frame identifier corresponding to the target video data from the played video data includes: if the target video data object is a B frame or a P frame, starting from the target video data object to a first video data object in the played video data, searching a first video data object with an I frame identifier corresponding to the target video data object. If the video data object in the played video data changes, the first video data object may change, for example, the first video data object in fig. 7 is the video data object at position 1, and the first video data object in fig. 8 is the video data object at position 11.

In one possible implementation manner, taking a video data object in the played video data pointed by the first start identifier as a first frame, and starting from the first frame, acquiring video data from the played video data includes: starting from the first frame and ending until the video data object with the first frame interval recording time length stops, acquiring the video data object from the played video data, and combining the acquired video data objects to obtain video data so as to meet the requirement of acquiring the video data with the recording time length; or, taking the video data object in the played video data pointed by the first start identifier as the first frame, and starting from the first frame, acquiring the video data from the played video data includes: starting from the first frame to the end of the last video data object, and acquiring the video data object from the played video data; and combining the obtained video data objects to obtain video data, wherein the time length of the video data is greater than the re-recording time length, and the requirement of re-recording from the last video data is met.

In one possible implementation, the method further comprises: if the acquired video data object is a special video object, forbidding the special video object; or if the acquired video data object is a special video object, adjusting the special video object according to a video data object related to the special video object in the played video data, wherein the adjusted special video object is used for acquiring video data. In some examples, the disabling may be deleting or discarding, etc., i.e., the electronic device does not use the special video object because the special video object is not a frame in the video, and by disabling the special video object, the impact on the second video may be reduced, preventing frames in the second video that are unrelated to the first video from occurring. In some examples, the adjustment to the special video object may be to generate a video data object according to a video data object related to the special video object in the played video data, and add the generated video data object to the video data. The video data object that is generated may be a frame that is lost when buffered in general, and therefore adding the regenerated video data object to the video data may improve accuracy.

In one possible implementation, adjusting the special video object according to a video data object related to the special video object in the played video data includes: and adjusting the special video objects according to N video data objects positioned in front of the special video objects and/or M video data objects positioned behind the special video objects in the played video data, wherein N and M are natural numbers which are larger than or equal to 1. For example, using the 5 video frames before the special video object and the 5 video frames after the special video object, the image data of 10 video frames is used to obtain the image data of the characteristic video object, and the image data of the special video object can obtain one video frame. If the average value of the pixel values of the same pixel in 10 video frames is the pixel value of the corresponding pixel in the special video object, the image data of the special video object is obtained after the averaging processing of all the pixels is completed.

In a possible implementation manner, taking the audio data object in the played audio data pointed by the second start identifier as the leading audio, and starting from the leading audio, the obtaining the audio data from the played audio data includes: acquiring an audio data object from the played audio data from the beginning of the first segment of audio to the end of the last audio data object; and combining the acquired audio data objects to obtain audio data.

In one possible implementation, the method further comprises: and if the acquired audio data object is the special audio object, forbidding the special audio object.

In a possible implementation manner, the played video data is cached in a first cache space, the position of the last video data object in the first cache space is a first position, and the first position is an identifier of the last video data object; responding to the recording signal, searching the target video data object with the distance from the last video data object in the played video data as the recording duration from the played video data comprises the following steps: determining video data objects which are located at the S x T positions before the first position in the first cache space and belong to the first video as target video data objects, wherein S is the number of the video data objects cached in unit time, and T is the recording time length; determining the first start identifier based on at least the identifier of the target video data object comprises: starting from a target video data object, searching a first video data object with an I frame identifier, which belongs to a first video, in a first cache space, and acquiring the position of the first video data object with the I frame identifier in the first cache space; the method comprises the steps of determining the position of a first video data object with an I frame identifier in a first cache space as a first starting identifier, starting from the first video data object with the I frame identifier to the end of a last video data object, and obtaining the video data object, wherein the first video data object with the I frame identifier and the last video data object enable the interval duration between video data to be less than the recording duration, namely the duration of the video data to be less than the recording duration, but the recording from the last video data object in the cache can be met, the first frame in the video data is the video data object with the I frame identifier, a P frame or a B frame is not referred in the I frame decompression process, and picture information is completely reserved in the I frame, so that the first picture can be accurately restored from the video data, and the P frame or the B frame after the I frame can be referred to the I frame for decompression when the P frame or the B frame is decompressed, and the accuracy is improved. Wherein the first buffer space may be an array of video loops as described below.

In a possible implementation manner, the played audio data is cached in a second cache space, the position of the last audio data object in the second cache space is a second position, and the second position is an identifier of the last audio data object; determining the interval between the first start identification and the identification of the last video data object comprises: determining a position difference between a position corresponding to the first start identifier and the first position as an interval; in the played audio data, searching for an audio data object spaced from the last audio data object of the played audio data, and using the identifier of the searched audio data object as a second starting identifier includes: determining the total number B of audio data objects according to A (gap/S), wherein A is the number of the audio data objects cached in unit time, and gap is an interval; and searching the audio data object which belongs to the first video and is located at the B-th position before the second position in the second cache space, and determining the position of the searched audio data object in the second cache space as a second initial identifier.

If A (gap/S) is an integer, the time length of an interval between an audio data object pointed by the second starting identifier and the last audio data object is the same as the time length of an interval between a video data object pointed by the first starting identifier and the last video data object in the video data, and the time of the last video data object is the same as the time of the last audio data object, then the starting time corresponding to the first starting identifier is the same as the starting time corresponding to the second starting identifier, so that the video data and the audio data with the same time length can be obtained at the same time, and the video data and the audio data of the second video can be synchronized in sound and picture; if a (gap/S) is not an integer, the electronic device may round or round a (gap/S) to make the start time corresponding to the first start identifier and the start time corresponding to the second start identifier close to each other, and may also ensure the sound and picture synchronization. Wherein the second buffer space may be an array of audio cycles as described below.

In one possible implementation, before responding to the recording signal, the method further includes: acquiring a video data object and an audio data object at each unit time; if the number of the video data objects acquired in unit time is larger than a preset first number, deleting part of the video data objects, wherein the number of the remaining video data objects is the preset first number, and the preset first number is the number of the video data objects which can be cached in unit time; if the number of the video data objects acquired in unit time is smaller than a preset first number, adding a special video object, wherein the special video object is identified by a special mark, and the sum of the number of the added special video object and the number of the acquired video data objects is the preset first number; if the number of the audio data objects acquired in the unit time is larger than a preset second number, deleting part of the audio data objects, wherein the number of the remaining audio data objects is the preset second number, and the preset second number is the number of the audio data objects which can be cached in the unit time; if the number of the audio data objects acquired in the unit time is smaller than the preset second number, adding a special audio object, wherein the special audio object is identified by a special mark, and the sum of the number of the added special audio object and the number of the acquired audio data objects is the preset second number, so that the preset first number of video data objects and the preset second number of audio data objects are cached in the unit time.

In one possible implementation, deleting the partial video data object includes: deleting video data objects acquired after the S-th video data object, wherein S is a preset first number; deleting a portion of the audio data object includes: deleting the audio data objects acquired after the A-th audio data object, A being a preset second number.

In one possible implementation, the method further comprises: if the number of the remaining positions in the first cache space is larger than or equal to a preset first number, writing the acquired video data object after the last video data object in the first cache space; if the number of the remaining positions in the first cache space is smaller than the preset first number, writing the acquired partial video data objects from the last video data object to the last position in the first cache space, and writing the remaining video data objects from the first position in the first cache space; and determining the position of the video data object in the first cache space as the identifier of the video data object, caching the video data object by multiplexing the position in the first cache space, and distinguishing the video data object by the identifier of the video data object.

In one possible implementation, the method further comprises: after the preset second number of audio data objects are obtained, if the number of the remaining positions in the second cache space is larger than or equal to the preset second number, writing the obtained audio data objects after the last audio data object in the second cache space; if the number of the remaining positions in the second cache space is smaller than the preset second number, writing the acquired partial audio data objects from the last audio data object in the second cache space to the last position, and writing the remaining audio data objects from the first position of the second cache space; and determining the position of the audio data object in the second buffer space as the identification of the audio data object, buffering the audio data object by multiplexing the position in the second buffer space, and distinguishing the audio data object by the identification of the audio data object.

In a second aspect, the present application provides an electronic device, comprising: one or more processors; one or more memories; the memory stores one or more programs that, when executed by the processor, cause the electronic device to perform the video acquisition method described above.

In a third aspect, the present application provides a computer-readable storage medium, in which a computer program is stored, and when the computer program is executed by a processor, the computer program causes the processor to execute the above-mentioned video acquisition method.

Drawings

FIG. 1 is a hardware block diagram of an electronic device provided herein;

FIG. 2 is a software architecture diagram of an electronic device provided herein;

fig. 3 is a schematic diagram of a video acquisition method provided in the present application;

fig. 4 is a flowchart of a video acquisition method provided in the present application;

FIG. 5 is a schematic diagram of a write of an audio data object provided herein;

FIG. 6 is a schematic diagram of another writing of audio data objects provided herein;

FIG. 7 is a schematic diagram of determining a start marker provided herein;

FIG. 8 is another illustration of determining a start marker provided herein;

fig. 9 is a schematic diagram of processing a special audio object and a special video object provided in the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. The terminology used in the following examples is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in the specification of this application and the appended claims, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, such as "one or more", unless the context clearly indicates otherwise. It should also be understood that in the embodiments of the present application, "one or more" means one, two or more; "and/or" describes the association relationship of the associated objects, indicating that three relationships may exist; for example, a and/or B, may represent: a exists singly, A and B exist simultaneously, and B exists singly, wherein A and B can be singular or plural. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.

Reference throughout this specification to "one embodiment" or "some embodiments," or the like, means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," or the like, in various places throughout this specification are not necessarily all referring to the same embodiment, but rather mean "one or more but not all embodiments" unless specifically stated otherwise. The terms "comprising," "including," "having," and variations thereof mean "including, but not limited to," unless expressly specified otherwise.

The embodiments of the present application relate to a plurality of numbers greater than or equal to two. It should be noted that, in the description of the embodiments of the present application, the terms "first", "second", and the like are used for distinguishing the description, and are not to be construed as indicating or implying relative importance or order.

When using electronic equipment to run a video APP or a live APP, the electronic equipment may record video data and audio data. If the electronic device monitors a key event (such as a highlight moment), the electronic device can extract video data and audio data of the key event through the backrecording service to obtain a video file of the key event. If the user watches the highlight content while the electronic device is running the video APP, the user may click a record button (the record button is an example) in the page of the video APP. The method comprises the steps that after the electronic equipment monitors the clicking operation of a recording button, a recording service is called to record a video file with preset time length, one mode is that the electronic equipment can extract video data and audio data with preset time length from the recorded video data and audio data, the extracted video data and audio data are combined into a video file, and the playing time length of the video file is the preset time length. The electronic equipment can reduce or avoid time delay between the video data timestamp and the audio data timestamp through the video data timestamp and the audio data timestamp, and the purpose of sound and picture synchronization is achieved through the timestamps. In the method for realizing sound-picture synchronization according to the time stamp, the electronic equipment records the time stamp of the video data when recording one piece of video data, and also records the time stamp of the audio data when recording one piece of audio data.

Before the electronic device extracts the video data and the audio data, the electronic device may determine a start time of video data extraction, compare the start time of the video data with a timestamp of each piece of audio data to determine a start time of the audio data, extract the video data according to the start time of the video data, and extract the audio data according to the start time of the audio data. In this way, the electronic device takes a long time to determine the start time of the audio data, and the video acquisition efficiency is reduced.

The application provides a video acquisition method, which can search forward a start identifier of video data from a last video data object of played video data (namely recorded video data), wherein the start identifier of the video data points to the start time of video data extraction, determine an interval from the start identifier of the video data extraction to the identifier of the last video data object, and search a start identifier of audio data by using the interval and the identifier of the last audio data object of the played audio data, wherein the start identifier of the audio data points to the start time of audio data extraction. Compared with a timestamp comparison mode, the electronic equipment can directly utilize the interval and the identifier of the last audio data object of the played audio data through the video acquisition method, so that the initial time of audio data extraction can be quickly found, and the video acquisition efficiency is improved.

For example, in a scene where the electronic device plays a video, the electronic device may implement a video acquisition method to record the video back to obtain a video file in which video data and audio data can be synchronized in audio and video. For example, when the electronic device runs an APP having a video frame (picture) output and a sound output, such as a video APP, a live APP, or a game APP, the electronic device may implement the video acquisition method.

In some examples, the video capture method may fix the number of audio data objects and the number of video data objects recorded (buffered) in a unit time (e.g., per second), each audio data object and video data object having a unique identifier, for example, each audio data object is buffered in an audio loop array, each video data object is buffered in a video loop array, each audio data object and video data object having a unique array index, and the array index is used as the position of each data object in the array to uniquely identify the audio data object or the video data object. When determining the start times of the video data and the audio data, the start identifier of the video data and the start identifier of the audio data are quickly located at the positions of the data objects in the array, in such a way that the start times of the video data and the audio data are the same or close, for example, the difference between the two start times is within a preset range (for example, less than 120 ms).

The video acquisition method may be applied to the electronic device shown in fig. 1, and in some embodiments, the electronic device may be a smart screen, a mobile phone, a tablet computer, a desktop computer, a laptop computer, a notebook computer, an ultra-mobile personal computer (UMPC), a handheld computer, a netbook, a Personal Digital Assistant (PDA), a wearable electronic device, a projector, or the like. The specific form of the electronic device is not particularly limited in the present application.

As shown in fig. 1, the electronic device may include: the mobile terminal comprises a processor, an external memory interface, an internal memory, a Universal Serial Bus (USB) interface, a charging management module, a power management module, a battery, an antenna 1, an antenna 2, a mobile communication module, a wireless communication module, a sensor module, a key, a motor, an indicator, an audio module, a camera, a display screen, a Subscriber Identity Module (SIM) card interface and the like. Wherein the audio module may include a speaker, a receiver, a microphone, an earphone interface, etc., and the sensor module may include a pressure sensor, a gyroscope sensor, an air pressure sensor, a magnetic sensor, an acceleration sensor, a distance sensor, a proximity light sensor, a fingerprint sensor, a temperature sensor, a touch sensor, an ambient light sensor, a bone conduction sensor, etc.

It is to be understood that the illustrated structure of the present embodiment does not constitute a specific limitation to the electronic device. In other embodiments, an electronic device may include more or fewer components than illustrated, or combine certain components, or split certain components, or a different arrangement of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

The processor may include one or more processing units, such as: the processor may include an Application Processor (AP), a modem processor, a Graphics Processing Unit (GPU), an Image Signal Processor (ISP), a controller, a video codec, a Digital Signal Processor (DSP), a baseband processor, and/or a neural-Network Processing Unit (NPU), among others. The different processing units may be separate devices or may be integrated into one or more processors. The processor is a nerve center and a command center of the electronic equipment, and the controller can generate an operation control signal according to the instruction operation code and the time sequence signal to finish the control of instruction fetching and instruction execution.

The external memory interface can be used for connecting an external memory card, such as a Micro SD card, so as to expand the storage capability of the electronic device. The external memory card communicates with the processor through the external memory interface to realize the data storage function. Such as storing video data and audio data, etc. in an external memory card. The internal memory may be used to store computer-executable program code, which includes instructions. The processor executes various functional applications of the electronic device and data processing by executing instructions stored in the internal memory. For example, in the present application, the processor causes the electronic device to execute the video acquisition method provided in the present application by executing instructions stored in the internal memory.

The display screen may be used to display images or video. The display screen includes a display panel. The display panel may be a Liquid Crystal Display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode (AMOLED), a flexible light-emitting diode (FLED), a Mini light-emitting diode (Mini LED), a Micro light-emitting diode (Micro LED), a Micro OLED (Micro OLED), or a quantum dot light-emitting diode (QLED). In some embodiments, the electronic device may include 1 or N display screens, N being a positive integer greater than 1.

Loudspeakers, also known as "horns," are used to convert electrical audio signals into sound signals. The electronic device may broadcast audio data through a speaker, for example, when the electronic device plays video, the electronic device plays audio data through the speaker to restore sound in the video.

In addition, an operating system runs on the above components. Such as the iOS operating system developed by apple, the Android open source operating system developed by google, the Windows operating system developed by microsoft, and so on. A running application may be installed on the operating system.

The operating system of the electronic device may employ a layered architecture, an event-driven architecture, a micro-kernel architecture, a micro-service architecture, or a cloud architecture. The embodiment of the application takes an Android system with a layered architecture as an example, and exemplarily illustrates a software and hardware structure of an electronic device. Fig. 2 is a block diagram of a software structure of the electronic device. The software structure adopts a layered architecture, the layered architecture divides the software into a plurality of layers, and each layer has clear roles and division of labor. The layers communicate with each other through a software interface. Taking the Android system as an example, in some embodiments, the Android system is divided into four layers, which are an application layer, an application Framework layer (Framework), a Hardware Abstraction Layer (HAL), a kernel layer, and a hardware layer from top to bottom.

Wherein the application layer may comprise a series of application packages. The application program package can comprise video APPs, APPs such as live APPs and the like, the application layer can further comprise service APPs, the service APPs can provide services for the APPs corresponding to the application program package, when the APPs corresponding to the application program package run, the service APPs providing the services for the application program package can run, for example, the application program can comprise a television service APP (TvServiceTv) and a screen recording APP (Wnderful Tv), the TvServiceTv can be used in video playing processes such as the video APPs and the live APPs, the Wnderful Tv can be used during screen recording and backrecording, backrecording represents that the video has wonderful time and the like, the backrecording can be triggered by a user, and the electronic equipment can automatically trigger the backrecording after the wonderful time is automatically monitored.

The application framework layer provides an Application Programming Interface (API) and a programming framework for the application program of the application layer. The application framework layer includes a number of predefined functions. For example, the application framework layer may include an audio policy executor (audioFlinger), a media player (mediaPlayer), and a windowing system (surfaceFlinger). The AudioFlinger is used to output audio data, the MediaPlayer is used to output video data, and the SurfaceFlinger is used to output interface data such as output progress bar data indicating the playing progress of video.

The HAL layer & kernel layer may provide a call interface for the upper layer and receive data transmitted by the upper layer, for example, the HAL layer & kernel layer includes an Audio Output (AO) module, a Video Output (VO) module, an on-screen menu adjustment mode (OSD) module, and a MIX (MIX) module. The Audio Flinger outputs the audio data to the AO module, and the audio data in the AO module can be played through a loudspeaker; the MediaPlayer outputs the video data to the VO module, and the video data in the VO module can be displayed through a display screen; the surfaceflag outputs the interface data to the OSD module, and the interface data in the OSD module can be displayed through the display screen to display the playing progress and the like when the display screen displays a picture (an image restored by the video data). The MIX module may interact with the VO module and the OSD module to MIX the video data and the interface data, for example, the video data may restore an image (picture), the interface data may restore an interface, and the mixing of the video data and the interface data may be adding an interface to the image, that is, obtaining an image with an interface, which may be regarded as adding the interface data to the video data.

The HAL layer & kernel layer may also comprise some functions or interfaces, e.g. the HAL layer & kernel layer may comprise an audio capture interface (AudioCapture) which may capture data from the AO-module and a screen capture interface (ScreenCapture) which may capture data from the MIX-module.

The driver layer may provide device drivers, for example, the driver layer includes drivers such as a High Definition Multimedia Interface (HDMI), a Digital TV (DTV), and an Audio and Video (AV).

The interaction flow among the layers in the software structure is shown in fig. 3, when the electronic device runs the video APP to play video, the video APP may send video play signals to the AudioFlinger, mediaPlayer and surface flinger, which respectively respond to the video play signals. The AudioFlinger sends the audio data of the video to the AO module, which may save the audio data. MediaPlayer sends video data (also referred to as image data or picture data) of the video to the VO module, which can save the video data. The surfaceflag sends the interface data (e.g., progress bar data) of the video to the OSD module, and the OSD module may save the interface data. When the electronic equipment runs live APP live broadcast, the live APP can send live broadcast signals to the HDMI/DTV/AV of the drive layer to decode audio data to the AO module and decode video data to the VO module.

Send through the VO module and send and show the signal when video APP and live APP broadcast video, send through the VO module and send and show the end signal when the broadcast is ended. The TvServiceTv may listen for a display signal and a display completion signal, and if the display signal is listened to, the TvServiceTv transmits a display transmission notification to the Wnderful Tv, and the Wnderful Tv operates a screen recording service (screenRecordService) of the Wnderful Tv in response to the display transmission notification. The ScreenRecordService starts recording audio and video, can call a TvServiceTv-packaged interface to acquire audio and video, for example, call Audio Capture and ScreenCapture through the TvServiceTv, the Audio Capture can acquire audio data from an AO module, the ScreenCapture can acquire mixed data from an MIX module, the mixed data is at least obtained based on video data in the VO module, and if corresponding interface data exists in the OSD module, the mixed data can be obtained based on the video data and the interface data.

In the playing process, the ScreenRecordService can always call audioCapture and screenCapture through the TvServiceTv and record the audio and video. The user makes a recording operation of wonderful time, wonderful tv can monitor the recording signal, respond to the recording signal, call a wonderful instantaneous recording interface (recordmement) of a recording manager (recordmager), write video data and audio data for recording into an audio/video synthesizer (MediaMuxer), and synthesize the video data and the audio data by the MediaMuxer to obtain a video file, wherein the video data and the audio data for recording have the same or close starting time, the recordmager can be a video APP and a live APP in an application layer, the recordmement in the recordmager has an interface recordmement, and the MediaMuxer can be a standard tool for synthesizing files in the application layer.

Fig. 4 shows a flow of starting recording of an audio/video from a ScreenRecordService to a mediaplayer synthesized video file, where fig. 4 shows a flow of a video acquisition method by taking an intelligent screen playing a video as an example, and the flow may include the following steps:

step S101, the intelligent screen plays the video, and the intelligent screen starts a screenRecordService.

Step S102, the Screen RecordService calls the TvServiceTv packaged interface to obtain the audio and video, wherein the TvServiceTv is packaged with a jni interface, such as an Audio Capture interface for obtaining audio data and a Screen Record interface for obtaining video data.

The TvServiceTv may initialize the video recording service, for example, the TvServiceTv may initialize the number of video data objects recorded in a unit time, the frame rate, and the like, for example, the TvServiceTv initializes recording S video frames per second, and the frame rate determines the number of video frames (i.e., the number of video data objects) collected per second, so that the frame rate may be determined according to the number of video frames after the initialization of the number of video frames is completed. The number of audio data objects and the duration of each audio data may be set in advance, so that the duration of each audio data is fixed, and may be initialized by TvServiceTv, for example, recording a number of audio data per second. The values of a and S are greater than 1, and the value of a may be greater than the value of S, for example, a =15, S =30, which is not limited in this embodiment.

And S103, acquiring A pieces of audio data every second by using Audio Capture, wherein the A pieces of audio data form an audio data array. After the AudioCapture finishes one-second audio data acquisition, the audio data array is stored in an audio buffer (buffer), the audio data cached by the audio buffer is called an audio data object (AudioFrame), and the audio data in the audio data array and the audio buffer can be sorted according to the order of acquisition, for example, the earlier acquired audio data is sorted in the audio data array more forward. Because the audio data are sorted according to the acquisition sequence, and the acquisition sequence can also represent the sequence of the audio data when the video is played, the positional relationship between the audio data is the same as the positional relationship when the video is played according to the acquisition sequence, the positional relationship represents the playing sequence of the audio data (the acquisition sequence or the acquisition sequence), the sequence of the played audio data can be represented, and the earlier the audio data is played, the earlier the sequence of the audio data in the audio data array is.

The AudioCapture may acquire the audio data based on a duration of the audio data set in advance, so that durations of the a pieces of audio data acquired per second are the same/consistent. For example, 10 pieces of audio data are acquired every second, the duration of each piece of audio data is 100ms, the audiocapture may acquire one piece of audio data every 100 milliseconds (ms), acquire the audio data in one second while splitting the audio data, and acquire one piece of audio data every a certain duration (e.g., 100 ms).

In some examples, the TvServiceTv may initialize the number of audio data objects, and the AudioCapture may split a piece of captured audio data after completing one second of audio data capture, so as to obtain a plurality of audio data with the same number as the number of audio data objects. If the AudioCapture acquires a piece of audio data with a duration of 640ms in one second, the number of audio data objects is 10, and the AudioCapture may divide the audio data with a duration of 640ms into 10 pieces of audio data, where each of the 10 pieces of audio data has a duration of 64ms. The uniform division is an example, and may not be uniform, and is not described herein again.

And step S104, writing the audio data objects in the audio buffer into an audio cycle array for storage, wherein the ordering of the audio data objects is unchanged during writing, and the currently written audio data objects are positioned behind the previously written audio data objects and ordered according to the acquisition sequence thereof, so as to ensure that the audio data objects in the audio cycle array are ordered according to the acquisition sequence and are the same as the ordering of the audio data during playing video, so that the audio data can be extracted according to the ordering in the audio cycle array during extracting the audio data, and the accuracy of the extracted audio data is ensured.

For example, 15 audio data are acquired every second, the 15 audio data may be sorted according to the acquisition sequence, as shown in fig. 5, the first acquired audio data may be sorted at position 1, the second acquired audio data may be sorted at position 2, \8230 \ 8230, and so on, the last acquired audio data in one second may be sorted at position 15, the 15 audio data form an audio data array to be stored in the audio buffer, and the audio data objects in the audio buffer are sequentially written into the audio cycle array according to the order. In fig. 5, the audio cycle array has been written into the position 30, then the audio data objects in the audio buffer ordered from 1 to 15 are written into the positions 31 to 45, the array index of the audio data can be obtained based on the position of the audio data in the audio cycle array, if the position of the audio data is 31, then the array index is 30, the audio cycle array [30] refers to the audio data with the position 31, and the array index of the audio data object in the audio cycle array is the unique identifier of the audio data object. When the audio data object is written into the audio cycle array from the audio buffer, the AudioCapture may continue to acquire the audio data, and after 15 audio data are acquired, the 15 audio data that are acquired again may continue to be sequentially written into the audio buffer according to the sequence.

The audio cycle array may set a maximum position and the audio cycle array may multiplex positions if the remaining positions in the audio cycle array do not satisfy the audio data object write requirements. The non-satisfaction of the audio data object write requirement may be that the remaining positions in the audio loop array are less than the total number of audio data objects in the audio buffer. For example, as shown in fig. 6, the audio loop array stores 1000 audio data objects at most, the audio data objects are currently written into the position 990, and the remaining 10 positions of the audio loop array are written into the audio data objects, so the audio data objects from position 1 to position 10 in the audio buffer are written into the positions 991 to position 1000, the audio data objects from position 11 to position 15 multiplex the first 5 positions of the audio loop array, but the array indices of the 5 audio data objects are 1001 to 1005, and the positions can also be adjusted to 1001 to 1005, so as to indicate that the positions of the audio loop array are multiplexed, and the 5 audio data objects are acquired later, in which case, the positions of the audio data objects in the audio loop array can also be used as unique identifiers. However, if the first 5 positions still adopt positions 1 to 5, taking the positions as the unique identifiers of the audio data objects may cause data reading errors, and in this case, the positions and the obtaining time of the audio data objects are taken as the identifiers of the audio data objects, and the obtaining time is used for distinguishing the obtaining sequence of the audio data objects, so as to determine the searching direction of the initial identifiers of the audio data through the obtaining time when multiplexing the positions.

In some examples, after the audio data object in the audio buffer is written to the audio loop array, the audio buffer is emptied in preparation for saving the audio data object again. One point to be explained here is: if the total duration of the audio data in one second is less than one second, the AudioCapture may make some audio data empty in a manner of obtaining one audio data at a certain interval duration, but the order of obtaining the audio data is determined based on the order of occurrence of the audio data in one second, the audio data in the audio buffer may be sorted according to the respective playing time points in one second, after the audio data is written into the audio cycle array according to the order of obtaining, the audio data in the audio cycle array is actually stored according to the actual playing time point of each audio data, and the actual playing time point is stored according to the time point of occurrence of the audio data during video playing. Although the audio cycle array does not record the time of each audio data, the audio cycle array may be stored according to the actual play time point of each audio data. When the plurality of audio data are extracted from the audio cycle array, the plurality of audio data can restore a section of audio according to the sequence in the audio cycle array, and the restored audio is the same as the audio during video playing, so that the accuracy is improved.

Step S105, using the ScreenRecord to obtain S video frames per second, where the S video frames form a video data array, the video data array is stored in the video buffer, and the video frame cached by the video buffer is referred to as a video frame object (ScreenFrame) or may be referred to as a video data object. If the video frame of the video frame object is an I frame, the video frame object also includes an I frame identifier.

And if the ScreenRecord determines that the video frame is an I frame, adding an I frame identifier to the video frame. For example, the tube is an I frame identifier, and if the video frame is an I frame, a true is added to the video frame; if the video frame is a P frame or a B frame, adding false to the video frame, and indicating that the video frame is not an I frame through false. The addition of the I-frame identifier may be when (or before) the ScreenRecord video data array is written into the video buffer, or may be to determine whether the video frame is an I-frame after the ScreenRecord collects a video frame, so that the I-frame identifier is already added to the video frame in the video data array.

The video frames in the video data array may be sorted according to the order of acquisition, for example, the earlier the video frames are sorted in the video data array, the video frame objects are sorted in the video buffer according to the order of the video frames in the video frame objects in the video data array, so that the video frames in the video data array and the video buffer are sorted in the same order. The video frames are sequenced according to the acquisition sequence, and the acquisition sequence can also represent the sequence of the video frames when the video is played, so that the position relationship among the video frames is the same as the position relationship when the video is played according to the acquisition sequence, the position relationship represents the playing sequence relationship (the acquisition sequence relationship or the acquisition sequence relationship) of the video frames, the display sequence relationship of the video frames in the display screen can be represented, and the earlier the displayed video frames are in the front of the sequencing of the video data array.

In some examples, the TvServiceTv may initialize the number of video frame objects and the frame rate, the ScreenRecord may collect the video frames according to the frame rate, for example, collect 30 video frames per second according to the frame rate, and the value of S is not described herein again.

And S106, writing the video frame objects in the video buffer into a video cyclic array for storage, wherein the ordering of the video frame objects is unchanged during writing, and the currently written video frame objects are positioned behind the previously written video frame objects and ordered according to the acquisition sequence thereof, so as to ensure that the video frame objects in the video cyclic array are ordered according to the acquisition sequence and are the same as the ordering of the video frames during playing the video, so that the video frames can be extracted according to the ordering in the video cyclic array during extracting the video frames, and the accuracy of extracting the video frames is ensured.

The process of writing the video frame object to the video cycle array is similar to the process of writing the audio data object to the audio cycle array, and can be seen in the example of fig. 5, and is not described in detail here. In some examples, after the video frame object in the video buffer is written to the video loop array, the video buffer is emptied in preparation for saving the video frame object again. The array subscript of the video frame object in the video loop array can be used as the unique identifier of the video frame object. The video loop array can set a maximum position, and if the remaining positions in the video loop array do not satisfy the writing requirement of the video frame object, the video loop array can multiplex the positions, which is described in the example of fig. 6 and will not be described in detail herein.

S107, monitoring a user triggered highlight moment by Wonderfful Tv, calling a highlight moment recording interface (recordMoment) of a recording manager (recordmager) to determine a start identifier of the video data, and determining a time interval gap between a video frame corresponding to the start identifier and a last video frame. WonderflulTv listens to a user-triggered highlight, stating that the user triggers a recording service, in which case WonderflulTv may call recordMoment of RecordManager to extract video data and audio data.

Before extracting the video data and the audio data, the recordMoment needs to determine the start identifier of the video data. As shown in fig. 4, when it is monitored that the user triggers the highlight moment, the current writing position of the video loop array is SN, that is, the position of the last video frame currently written into the video loop array in the video loop array is SN, record motion searches the first video frame object with the I frame identifier from the S × T position before the position SN, and uses the position of the video frame object in the video loop array as the start identifier of the video data.

T is the recording duration, S is the number of video frame objects acquired per second, T S is the number of video frame objects acquired within the duration T, and assuming that the S & ltth & gt position before the position SN is the position SI, the interval duration between the two positions from the position SI to the position SN is T, so that recordMoment searches the first video frame object with the I frame identification backward by taking the position SI as the starting position.

The video frame in the first video frame object with the I frame identification is an I frame, the I frame can be decompressed into a picture through a video decompression algorithm, a P frame or a B frame is not referred in the I frame decompression process, and picture information is completely reserved in the I frame, so that the position of the first video frame object with the I frame identification in a video cyclic array is taken as the starting identification of video data, the first extracted video frame is equivalent to the I frame, the first picture can be accurately restored, and the decompression can be performed with reference to the I frame when the P frame or the B frame after the I frame is decompressed, so that the accuracy is improved.

S108, extracting the video frame object with the initial mark to the position SN by the recordMoment. After the first video frame object with the I frame identification is at the position SI, the time length occupied by extracting the video frame object is less than T or equal to T, and the requirement of a user for recording from the position SN is met.

In some examples, recordmement may use the S × T position before the position SN as the start identifier of the video data, and the duration occupied by the extracted video frame object is equal to T, which also meets the requirement of the user to record from the position SN, but the first extracted video frame may be a B frame or a P frame, and the B frame or the P frame needs to be decompressed with reference to an I frame, and because the B frame or the P frame may be the first video frame, there is no I frame for reference, so that the probability of error of the B frame or the P decompressed frame is increased, and the accuracy is reduced.

In some examples, recordMoment may look ahead from the sth position before the position SN for the first video frame with the I-frame identification. Assuming that recordMoment finds the first video frame with the I-frame identifier at location SH ahead, the duration occupied by location SH to location SN is greater than T. If the position SH is used as the start identifier of the video data, in order to meet the requirement that the recording time length is T, the recordMoment ends when the position SZ is extracted, the time length occupied by the position SH to the position SZ is equal to T, the video frame object from the position SZ to the position SN is not extracted, and the requirement that the user starts recording from the position SN cannot be met.

S109, recording moment determines the initial identification of the audio data based on the current writing position AN of the audio cycle array, the initial identification of the video data and the interval gap between the positions SN. gap is the number of video frame objects spaced from the start identifier of the video data to the SN position, and recordMoment may use the a (gap/S) position before the AN position as the start identifier of the audio data, and (gap/S) is the duration occupied by the extracted video frame object.

S110, recording moment extracts the audio data object of the start identifier of the audio data to the position AN.

The video frame object and the audio data object extracted by the recordmement are written into a MediaMuxer, the MediaMuxer synthesizes a Moving Picture Experts Group 4 (MP 4) file based on the video object and the audio data object, the MP4 file is a video file with a file format of MP4, the MP4 file can be stored in a memory of the electronic device, and the MP4 file is a video file of a sub-video of the video played in step S101.

How to determine the start identifier of the video data and the start identifier of the audio data is described below in conjunction with an example, in which a =15, s =30, t =15s. Fig. 7 shows an example of determining the start identifier of the video data and the start identifier of the audio data, in fig. 7, when the recordmement monitors the user-triggered highlight, the current writing position of the video loop array is 500, s × t =450, and the recordmement can search for the first video frame object with the I frame identifier from the 450 th position before the position 500, that is, from the position 50 to the position 500. Assuming that recordMoment finds the first video frame object with an I frame identifier at location 60 (denoted by the I frame identifier in fig. 7), recordMoment extracts the video frame object starting at location 60 and ending at location 500, where location 60 is the start identifier and location 500 is the end identifier of the extracted video data.

The interval between position 60 and position 500 is gap =440, a (gap/S) =15 (440/30) =220. The current writing position of the audio loop array is 300, the audio data object is extracted from the 220 th position before the position 300, that is, the audio data object is extracted from the position 80 to the end of the position 300, the position 80 is the start identifier of the extracted audio data, and the position 300 is the end identifier of the extracted audio data. The interval from the position 80 to the position 300 is 220, 220/15=440/30, and the duration of the extracted video data and audio data is the same. The time corresponding to the position SN and the position AN is the same, which means that the end time of the video data is the same as the end time of the audio data, and the duration of the video data is the same as the duration of the audio data, so that the start time of the video data is the same as the start time of the audio data, which means that the start time corresponding to the start identifier of the video data and the start time corresponding to the start identifier of the audio data are the same time point, and the video data and the audio data with the same duration are extracted from the same start time, so that the audio data and the video data in the video file can be synchronized in sound and picture.

FIG. 8 shows another example of determining a start marker for video data and a start marker for audio data, FIG. 8 being for a position multiplexing scenario where position 1 through position 10 of the video loop array are multiplexed in FIG. 8, the current write position of the video loop array being 1010; position 1 to position 5 of the audio loop array are multiplexed, and the current write position of the audio loop array is 1005.

Since S × T =450, then the 450 th position before position 1010 in the video loop array is position 660, recordmoment starts from position 660, and the first video frame object with I frame identification is looked up between position 660 to position 1010. Assuming that recordMoment extracts video frame objects starting at location 680 and ending at location 1010, from location 680 to the first video frame object with an I frame identification. The gap between position 680 and position 1010 is gap =330, a (gap/S) =15 (330/30) =165. The current writing position of the audio loop array is 1005, and the audio data object is extracted from the 165 th position before the position 1005, that is, from the position 840 to the position 1005.

Based on the example of fig. 7, the duration is 14.67s, and the duration is less than 15s; based on the example of fig. 8, the duration of the record comment extraction is 11s, the duration is less than 15s, and the durations of the two extractions are not the same. Of course, the recordmement may also determine the start identifier in other manners, taking the video loop array shown in fig. 7 as an example, the recordmement may take the 450 th position before the position 500 as the start identifier of the video data, extract the video frame object from the position 50 to the position 500, and the extracted time length is (500-50)/30 =15s, which is the same as the set time length T.

If a is not an integer, recordMoment may obtain a calculation result of a (gap/S) in any one of a rounding manner, a rounding down manner, and a rounding up manner, for example, a: (gap/S) =17.8, which means that recordMoment extracts an audio data object from the 17.8 th position before the current writing position, but 17.8 is not an integer, recordMoment cannot locate an audio data object, for which reason recordMoment processes a: (gap/S) in a rounding manner, extracts an audio data object from the 18 th position before the current writing position, and the start time of extracting an audio data object is slightly delayed with respect to the start time of a video data object, so that the start time of audio data is close to the start time of video data.

In some examples, if the number of audio data acquired per second by the AudioCapture is greater than a, audio data acquired after the a-th audio data is discarded (i.e., deleted); if the quantity of the audio data acquired by the audioCapture per second is less than A, filling is carried out by using a special audio object so as to meet the requirement of acquiring A audio data per second. The special audio object may be an empty audio data or an audio data with fixed content, and the special audio object is identified by a special mark, so that the special mark can be used for distinguishing when the audio data is extracted.

If the number of video frames acquired by the ScreenRecord per second is greater than S, discarding (i.e., deleting) the video frames acquired after the S-th video frame; if the number of video frames acquired by the ScreenRecord per second is less than S, the special video object is used for filling to meet the requirement of acquiring S video frames per second. The special video object can be an empty video frame or a video frame with fixed content, and the special video object is identified by a special mark, so that the special mark can be used for distinguishing when the video frame is extracted.

For a scene with a special audio object and a special video object, when the recordmement extracts a video frame and an audio data object, if the audio data object to be extracted is the special audio object, the recordmement can skip the special audio object (i.e. the special audio object is not extracted to be forbidden), and can also extract but forbid the special audio object; if the video data object to be extracted is a special video object, recordmement may skip the special video object, or recordmement may adjust the special video object based on a number of video frames adjacent to the special video object.

In some examples, the recordmement may adjust the special video object based on a number of video frames before and after the special video object, for example, using 5 video frames before the special video object and 5 video frames after the special video object, using image data of 10 video frames to obtain image data of the feature video object, which may obtain one video frame. For example, the average value of the pixel values of the same pixel in 10 video frames is the pixel value of the corresponding pixel in the special video object, and after the averaging processing of all the pixels is completed, the image data of the special video object is obtained.

In some examples, the recordMoment may adjust the particular video object based on a number of video frames prior to the particular video object; in some examples, the recordMoment may adjust the particular video object based on a number of video frames following the particular video object; the process can be referred to the above description, and is not described herein.

If the special video object is extracted when the special video object is adjusted, and the special video object is read when the video frames are read from the front and back of the special video object, the special video object is skipped by the recordmement, and of course, the special video object read when the special video object is adjusted can be adjusted first, and then the special video object to be processed is adjusted by using the video frame and the adjusted special video object. The recordMoment may determine a processing mode of the special video object in the video loop array according to the current power consumption, resource consumption, and the like, for example, if the CPU is idle, the recordMoment may adjust the special video object.

An example of the recordMoment processing of a special audio object and a special video object is shown in fig. 9, and for a special audio object, the recordMoment may skip the special audio object; for a special video object, recordMoment adjusts the special video object using the 5 video frames before the special video object and the 5 video frames after the special video object.

Furthermore, the present application provides an electronic device comprising: one or more processors; one or more memories; the memory stores one or more programs that, when executed by the processor, cause the electronic device to perform the video acquisition method described above.

The present application provides a computer-readable storage medium having stored therein a computer program which, when executed by a processor, causes the processor to execute the above-described video acquisition method.

Claims

1. A method for video acquisition, the method comprising:

playing the first video;

receiving a recording signal aiming at the first video in the process of playing the first video, wherein the recording signal is used for indicating that the played video data and the played audio data of the first video are utilized to generate a second video of the first video;

responding to the re-recording signal, and searching a target video data object with the distance from the last video data object in the played video data as the re-recording duration from the played video data;

determining a first start identifier at least according to the identifier of the target video data object;

taking a video data object in the played video data pointed by the first start identifier as a first frame, and starting from the first frame, acquiring video data from the played video data;

determining an interval between the first start identification to the identification of the last video data object;

searching for an audio data object spaced from the last audio data object of the played audio data by the interval in the played audio data, and taking the identifier of the searched audio data object as a second initial identifier;

taking an audio data object in the played audio data pointed by the second starting identifier as a head segment of audio, and starting from the head segment of audio, acquiring audio data from the played audio data;

generating the second video, the second video comprising the video data and the audio data.

2. The method of claim 1, wherein determining a first start identifier based on at least an identifier of the target video data object comprises:

if the target video data object is an I frame, determining the identifier of the target video data object as the first starting identifier;

if the target video data object is a B frame or a P frame, searching a first video data object with an I frame identifier corresponding to the target video data object from the played video data at least according to the target video data object, and determining the identifier of the first video data object with the I frame identifier as the first starting identifier.

3. The method of claim 2, wherein said searching for a first video data object with an I-frame ID from the played video data object according to at least the target video data object if the target video data object is a B-frame or a P-frame comprises:

and if the target video data object is a B frame or a P frame, starting to search a first video data object with an I frame identifier corresponding to the target video data from the target video data object to the last video data object.

4. The method of claim 3, wherein the video data object in the played video data pointed to by the first start identifier is a head frame, and wherein obtaining video data from the played video data starting from the head frame comprises:

acquiring a video data object from the played video data from the beginning of the first frame to the end of the last video data object; and combining the acquired video data objects to obtain video data, wherein the time length of the video data is less than or equal to the recording time length.

5. The method of claim 2, wherein the searching for the first video data object with I-frame id from the played video data according to at least the target video data object if the target video data object is a B-frame or a P-frame comprises:

if the target video data object is a B frame or a P frame, starting from the target video data object to a first video data object in the played video data, and searching a first video data object with an I frame identifier corresponding to the target video data object.

6. The method of claim 5, wherein the video data object in the played video data pointed to by the first start identifier is a head frame, and wherein obtaining video data from the played video data starting from the head frame comprises: starting from the first frame and ending until the video data object with the first frame interval back recording duration is recorded, acquiring a video data object from the played video data, and combining the acquired video data objects to obtain video data;

or

The video data object in the played video data pointed by the first start identifier is a head frame, and acquiring video data from the played video data from the head frame includes: acquiring a video data object from the played video data from the beginning of the first frame to the end of the last video data object; and combining the obtained video data objects to obtain video data, wherein the time length of the video data is longer than the recording time length.

7. The method according to claim 4 or 6, characterized in that the method further comprises: if the acquired video data object is a special video object, forbidding the special video object;

or

If the obtained video data object is a special video object, adjusting the special video object according to a video data object related to the special video object in the played video data, wherein the adjusted special video object is used for obtaining the video data.

8. The method according to claim 7, wherein said adjusting the special video object according to the video data object related to the special video object in the played video data comprises:

and adjusting the special video object according to N video data objects positioned in front of the special video object and/or M video data objects positioned behind the special video object in the played video data, wherein N and M are natural numbers larger than or equal to 1.

9. The method according to any one of claims 1 to 8, wherein the audio data object in the played audio data pointed to by the second start identifier is a leading segment of audio, and starting from the leading segment of audio, obtaining audio data from the played audio data comprises:

acquiring an audio data object from the played audio data from the beginning of the first segment of audio to the end of the last audio data object; and combining the acquired audio data objects to obtain audio data.

10. The method of claim 9, further comprising: and if the acquired audio data object is a special audio object, forbidding the special audio object.

11. The method according to any one of claims 1 to 10, wherein the played video data is cached in a first cache space, the location of the last video data object in the first cache space is a first location, and the first location is an identifier of the last video data object;

the searching, in response to the re-recording signal, a target video data object having a distance to a last video data object in the played video data as a re-recording duration from the played video data includes: determining video data objects which are located at the S x T positions before the first position in the first cache space and belong to the first video as the target video data objects, wherein S is the number of the video data objects cached in unit time, and T is the recording duration;

said determining a first start identifier based on at least the identifier of the target video data object comprises: starting from the target video data object, searching a first video data object with an I frame identifier, which belongs to the first video, in the first cache space, and acquiring the position of the first video data object with the I frame identifier in the first cache space; and determining the position of the first video data object with the I frame identifier in a first buffer space as the first starting identifier.

12. The method of claim 11, wherein the played audio data is cached in a second cache space, and wherein the location of the last audio data object in the second cache space is a second location, and wherein the second location is an identifier of the last audio data object;

said determining an interval between said first start identification to said identification of said last video data object comprises: determining a position difference between a position corresponding to the first start identifier and the first position as the interval;

the searching for the audio data object spaced from the last audio data object of the played audio data by the interval in the played audio data, and the using the identifier of the searched audio data object as the second starting identifier includes: determining the total number B of audio data objects according to A (gap/S), wherein A is the number of the audio data objects cached in unit time, and gap is the interval; and searching for an audio data object which is located at the second position before the second position in the second cache space and belongs to the first video, and determining the position of the searched audio data object in the second cache space as the second starting identifier.

13. The method of any one of claims 1 to 12, wherein prior to said responding to said playback signal, said method further comprises: acquiring a video data object and an audio data object at each unit time;

if the number of the video data objects acquired in unit time is larger than a preset first number, deleting part of the video data objects, wherein the number of the remaining video data objects is the preset first number, and the preset first number is the number of the video data objects which can be cached in unit time;

if the number of the video data objects acquired in unit time is smaller than the preset first number, adding a special video object, wherein the special video object is identified by a special mark, and the sum of the number of the added special video object and the number of the acquired video data objects is the preset first number;

if the number of the audio data objects acquired in the unit time is larger than a preset second number, deleting part of the audio data objects, wherein the number of the remaining audio data objects is the preset second number, and the preset second number is the number of the audio data objects which can be cached in the unit time;

and if the number of the audio data objects acquired in unit time is less than the preset second number, adding a special audio object, wherein the special audio object is identified by a special mark, and the sum of the number of the added special audio object and the number of the acquired audio data objects is the preset second number.

14. The method of claim 13, further comprising: if the number of the remaining positions in the first cache space is greater than or equal to the preset first number, writing the acquired video data object after the last video data object in the first cache space;

if the number of the remaining positions in the first cache space is smaller than the preset first number, writing the acquired partial video data objects from the last video data object to the last position in the first cache space, and writing the remaining video data objects from the first position in the first cache space;

and determining the position of the video data object in the first cache space as the identification of the video data object.

15. The method according to claim 13 or 14, characterized in that the method further comprises: after the preset second number of audio data objects is obtained, if the number of the remaining positions in the second cache space is larger than or equal to the preset second number, writing the obtained audio data objects after the last audio data object in the second cache space;

if the number of the remaining positions in the second cache space is smaller than the preset second number, writing the acquired part of audio data objects from the last audio data object to the last position in the second cache space, and writing the remaining audio data objects from the first position in the second cache space;

determining a position of an audio data object in the second buffer space as an identification of the audio data object.

16. An electronic device, characterized in that the electronic device comprises:

one or more processors;

one or more memories;

the memory stores one or more programs that, when executed by the processor, cause the electronic device to perform the video acquisition method of any of claims 1-15.

17. A computer-readable storage medium, in which a computer program is stored which, when executed by a processor, causes the processor to carry out the video acquisition method of any one of claims 1 to 15.