CN115604540B

CN115604540B - Video acquisition method, electronic equipment and medium

Info

Publication number: CN115604540B
Application number: CN202211065711.8A
Authority: CN
Inventors: 肖瑶; 杨毅轩; 林晨
Original assignee: Honor Device Co Ltd
Current assignee: Honor Device Co Ltd
Priority date: 2022-09-01
Filing date: 2022-09-01
Publication date: 2023-11-14
Anticipated expiration: 2042-09-01
Also published as: CN115604540A

Abstract

The application provides a video acquisition method and a device, comprising the following steps: playing the first video; receiving a return signal for a first video; responding to the recording signal, and searching a target video data object with the interval with the last video data object as the recording duration from the played video data; determining a first starting identifier at least according to the identifier of the target video data object; starting with the video data object in the played video data pointed by the first starting identification, and acquiring the video data from the played video data; determining an interval between the first start identifier and the identifier of the last video data object; searching an audio data object which is spaced from the last audio data object in the played audio data, wherein the identifier of the searched audio data object is used as a second initial identifier; starting with the audio data object in the played audio data pointed by the second starting identifier, and acquiring the audio data from the played audio data; a second video is generated.

Description

Video acquisition method, electronic equipment and medium

Technical Field

The present application relates to the field of multimedia technologies, and in particular, to a video acquisition method and apparatus.

Background

When running a video Application (APP) or a live APP using an electronic device, the electronic device may record video data and audio data, record a time stamp for the video data each time a piece of video data is recorded, and record a time stamp for the audio data each time a piece of audio data is recorded. The electronic device may invoke a callback service to extract video data and audio data from the recorded video data and audio data, and specifically, the electronic device may first determine a start time of video data extraction, compare the start time of video data with a time stamp of each piece of audio data to determine a start time of audio data, extract video data according to the start time of video data, and extract audio data according to the start time of audio data, and synthesize video based on the extracted video data and audio data. But in this way the electronic device takes a long time to determine the start time of the audio data, the video acquisition efficiency is reduced.

Disclosure of Invention

The application provides a video acquisition method and a video acquisition device, and aims to solve the problem that video acquisition efficiency is reduced due to timestamp comparison. In order to achieve the above object, the present application provides the following technical solutions:

In a first aspect, the present application provides a video acquisition method, the method comprising: playing the first video; receiving a recording signal aiming at the first video in the process of playing the first video, wherein the recording signal is used for indicating played video data and played audio data of the first video to generate a second video of the first video; responding to the recording signal, and searching a target video data object with the interval of the last video data object in the played video data as the recording duration from the played video data; determining a first starting identifier at least according to the identifier of the target video data object; taking a video data object in the played video data pointed by the first starting identifier as a first frame, starting from the first frame, and acquiring the video data from the played video data; determining an interval between the first start identifier and the identifier of the last video data object; searching an audio data object which is spaced from the last audio data object of the played audio data in the played audio data, wherein the identifier of the searched audio data object is used as a second initial identifier; taking an audio data object in the played audio data pointed by the second initial mark as a first section of audio, starting from the first section of audio, and acquiring audio data from the played audio data; a second video is generated, the second video including video data and audio data.

According to the video acquisition method, the electronic device can search the first initial identifier forward from the last video data object of the played video data, and after determining the interval between the initial identifier of the video data extraction and the identifier of the last video data object, find the second initial identifier by utilizing the interval and the identifier of the last audio data object of the played audio data, and particularly determine the identifier of the audio data object before the last audio data object and different from the interval as the second initial identifier. The first initial identifier points to one video data object in the played video data, the video data object is the first frame of the extracted video data, the second initial identifier points to one audio data object in the played audio data, the audio data object is the first audio of the extracted audio data, and compared with a time stamp comparison mode, the electronic equipment can directly utilize the interval and the identifier of the last audio data object of the played audio data through a video acquisition method to quickly find the first audio of the audio data, so that the video acquisition efficiency is improved.

The corresponding time of the first start mark is the start time of the video data, and the start time of the video data is the time of the first frame in the first video; the time corresponding to the second start identifier is the start time of the audio data, the start time of the audio data is also the time of the first audio in the first video, and by the video acquisition method of the embodiment, the start time of the video data is the same as or is close to the start time of the audio data, which means that the difference value between the two start times is within a preset range, for example, less than 120ms, so that the audio and video synchronization is ensured while the video acquisition efficiency is improved. The second video is a sub-video of the first video (i.e. a video of a segment of the first video), the first start identifier may correspond to a start identifier of the video data, and the second start identifier may correspond to a start identifier of the audio data.

In one possible implementation, determining the first starting identification based at least on the identification of the target video data object comprises: if the target video data object is an I frame, determining the identification of the target video data object as a first starting identification; if the target video data object is a B frame or a P frame, searching a first video data object with an I frame identification corresponding to the target video data object from the played video data at least according to the target video data object, and determining the identification of the first video data object with the I frame identification as a first starting identification. Wherein the first video data object with I-frame identification may be the video data object with I-frame identification closest to the target video data object, closest may mean that the position difference between the two video data objects is smaller than the position difference between the target video data object and the other video data objects, which may represent the position separation of the two video data objects in the first video, i.e. that there is no video data object with I-frame identification between the first video data object with I-frame identification and the target video data object.

The I frame can be decompressed into a picture through a video decompression algorithm, no P frame or B frame is seen in the I frame decompression process, and picture information is completely reserved in the I frame, so that the identification of a video object with the I frame identification is determined to be a first initial identification, which means that the first frame extracted by the electronic equipment is the I frame, the first picture can be accurately restored, and the P frame or B frame after the I frame can be decompressed by referring to the I frame when being decompressed, thereby improving the accuracy.

In one possible implementation, if the target video data object is a B-frame or a P-frame, searching the first video data object with the I-frame identifier corresponding to the target video data object from the played video data at least according to the target video data object includes: if the target video data object is a B frame or a P frame, starting to search the first video data object with the I frame identification corresponding to the target video data object from the target video data object to the last video data object. I.e. starting the search from the target video data object, the first video data object with an I-frame identification located after the target video data object, the last video data object being the last video data object to be searched.

In one possible implementation, taking the video data object in the played video data pointed to by the first start identifier as the first frame, and starting from the first frame, acquiring the video data from the played video data includes: obtaining video data objects from the played video data from the beginning of the first frame to the termination of the last video data object; and combining the acquired video data objects to obtain video data, wherein the duration of the video data is less than or equal to the recording duration. If the target video data object is an I frame, the target video data object is a first frame, the electronic device may extract the video data object from the target video data object to the last video data object (including the target video data object and the last video data object), where the duration of the interval between the target video data object and the last video data object is a recording duration, and thus the duration of the obtained video data is equal to the recording duration, and if the target video data object is a B frame or a P frame, the first frame pointed by the first start identifier is located after the target video data object, and the duration of the video data is less than the recording duration. As shown in fig. 7 below, the video data object cached at the location 50 is a target video data object, the location 50 is an identification of the target video data object, but the video data object cached at the location 50 is a B-frame or a P-frame, the first video data object with an I-frame identification is searched for from the location 50 backward, the video data object searched for at the location 60 has an I-frame identification, the video data object cached at the location 60 is a first frame, the video data object is extracted from the location 60 to the location 500, and the duration of the video data is less than the duration of the recording.

In one possible implementation, if the target video data object is a B-frame or a P-frame, searching the first video data object with the I-frame identifier corresponding to the target video data from the played video data at least according to the target video data object includes: if the target video data object is a B frame or a P frame, starting from the target video data object to the first video data object in the played video data, searching the first video data object with the I frame identification corresponding to the target video data object. If the video data object in the played video data changes, the first video data object may change, as in fig. 7, the first video data object is the video data object at position 1, and in fig. 8, the first video data object is the video data object at position 11.

In one possible implementation, taking the video data object in the played video data pointed to by the first start identifier as the first frame, and starting from the first frame, acquiring the video data from the played video data includes: starting from the first frame, obtaining video data objects from played video data until the video data objects with the recording time length at intervals from the first frame are terminated, and combining the obtained video data objects to obtain video data so as to meet the requirement of obtaining the video data with the recording time length; or taking the video data object in the played video data pointed by the first starting identifier as a first frame, and starting from the first frame, acquiring the video data from the played video data comprises the following steps: obtaining video data objects from the played video data from the beginning of the first frame to the termination of the last video data object; and combining the acquired video data objects to obtain video data, wherein the duration of the video data is longer than the recording duration, and the requirement of recording from the last video data is met.

In one possible implementation, the method further includes: if the obtained video data object is a special video object, disabling the special video object; or if the obtained video data object is a special video object, adjusting the special video object according to the video data object related to the special video object in the played video data, wherein the adjusted special video object is used for obtaining the video data. In some examples, disabling may be deleting or discarding, etc., i.e., the electronic device does not use the special video object because the special video object is not a frame in the video, and disabling the special video object may reduce the impact on the second video, preventing frames in the second video that are unrelated to the first video. In some examples, the adjustment to the special video object may be based on the video data object associated with the special video object in the played video data, generating a video data object, and adding the generated video data object to the video data. The video data object that is typically generated may be a frame that is lost while cached, so adding the regenerated video data object to the video data may improve accuracy.

In one possible implementation, adjusting the special video object based on the video data object associated with the special video object in the played video data includes: and adjusting the special video object according to N video data objects positioned before the special video object and/or M video data objects positioned after the special video object in the played video data, wherein N and M are natural numbers which are greater than or equal to 1. For example, using 5 video frames before the special video object and 5 video frames after the special video object, using image data of 10 video frames to obtain image data of the feature video object, the image data of the special video object may obtain one video frame. If the average value of the pixel values of the same pixel in 10 video frames is calculated, the average value is the pixel value of the corresponding pixel in the special video object, and after the average value calculation processing of all the pixels is completed, the image data of the special video object is obtained.

In one possible implementation, taking the audio data object in the played audio data pointed to by the second start identifier as the first audio, starting from the first audio, acquiring the audio data from the played audio data includes: the audio data object is obtained from the played audio data from the beginning of the first audio segment to the ending of the last audio data object; the acquired audio data objects are combined to obtain audio data.

In one possible implementation, the method further includes: and if the acquired audio data object is a special audio object, disabling the special audio object.

In one possible implementation, the played video data is cached in the first cache space, and the position of the last video data object in the first cache space is a first position, and the first position is the identification of the last video data object; responding to the recording signal, searching the target video data object which is separated from the last video data object in the played video data by the recording time length from the played video data comprises the following steps: determining a video data object which belongs to the first video and is positioned at an S & ltT & gt position in front of the first position in the first cache space as a target video data object, wherein S is the number of the video data objects cached in unit time, and T is the recording duration; determining a first starting identification based at least on the identification of the target video data object comprises: starting from a target video data object, searching a first video data object with an I frame identifier belonging to a first video in a first cache space, and acquiring the position of the first video data object with the I frame identifier in the first cache space; the method comprises the steps that the position of a first video data object with an I frame identifier in a first buffer space is determined to be a first initial identifier, then the video data object is obtained from the beginning of the first video data object with the I frame identifier to the end of the last video data object, the time length of the interval between the first video data object with the I frame identifier and the last video data object is smaller than the recording time length, namely the time length of the video data is smaller than the recording time length, but the condition that the last video data object in buffer memory begins to be recorded can be met, the first frame in the video data is the video data object with the I frame identifier, no P frame or B frame is seen in the I frame decompression process, and the I frame completely keeps picture information, so that the video data can accurately restore a first picture, and the P frame or the B frame after the I frame can be decompressed by referring to the I frame, and the accuracy is improved. Wherein the first buffer space may be a video loop array as described below.

In one possible implementation, the played audio data is cached in the second cache space, and the position of the last audio data object in the second cache space is a second position, and the second position is the identification of the last audio data object; determining the interval between the first start identifier and the identifier of the last video data object comprises: determining a position difference between a position corresponding to the first start identifier and the first position as an interval; in the played audio data, searching the audio data object which is spaced from the last audio data object of the played audio data, wherein the identification of the searched audio data object as a second initial identification comprises the following steps: determining the total number B of the audio data objects according to A (gap/S), wherein A is the number of the audio data objects cached in unit time, and gap is an interval; and searching an audio data object belonging to the first video at a B-th position before the second position in the second cache space, and determining the position of the searched audio data object in the second cache space as a second initial identifier.

If a (gap/S) is an integer, the duration of the interval from the audio data object pointed by the second start identifier to the last audio data object is the same as the duration of the interval from the video data object pointed by the first start identifier to the last video data object in the video data, and the time of the last video data object is the same as the time of the last audio data object, then the start time corresponding to the first start identifier is the same as the start time corresponding to the second start identifier, so that the video data and the audio data of the same duration can be obtained at the same time, and the video data and the audio data of the second video can be synchronized in an audio-visual manner; if a (gap/S) is not an integer, the electronic device may perform rounding or rounding operation on the a (gap/S) to make the start time corresponding to the first start identifier and the start time corresponding to the second start identifier close to each other, and may also ensure synchronization of audio and video. Wherein the second buffer space may be an audio cycle array as described below.

In one possible implementation, before responding to the recording signal, the method further includes: acquiring a video data object and an audio data object at each unit time; if the number of the video data objects acquired in the unit time is larger than the preset first number, deleting part of the video data objects, wherein the number of the rest video data objects is the preset first number, and the preset first number is the number of the video data objects which can be cached in the unit time; if the number of the video data objects acquired in unit time is smaller than the preset first number, adding special video objects, wherein the special video objects are marked by special marks, and the sum of the number of the added special video objects and the number of the acquired video data objects is the preset first number; if the number of the audio data objects acquired in the unit time is larger than the preset second number, deleting part of the audio data objects, wherein the number of the remaining audio data objects is the preset second number, and the preset second number is the number of the audio data objects which can be cached in the unit time; if the number of the audio data objects acquired in unit time is smaller than the preset second number, adding special audio objects, wherein the special audio objects are marked by special marks, and the sum of the number of the added special audio objects and the number of the acquired audio data objects is the preset second number, so that the preset first number of video data objects and the preset second number of audio data objects are cached in unit time.

In one possible implementation, deleting a portion of the video data object includes: deleting video data objects acquired after the S-th video data object, wherein S is a preset first quantity; deleting a portion of the audio data object includes: the audio data objects acquired after the a-th audio data object are deleted, a being a preset second number.

In one possible implementation, the method further includes: if the number of the residual positions in the first cache space is greater than or equal to the preset first number, writing the acquired video data objects after the last video data object in the first cache space; if the number of the residual positions in the first buffer space is smaller than the preset first number, writing the obtained partial video data objects from the last position to the last position after the last video data object in the first buffer space, and starting writing the residual video data objects from the first position of the first buffer space; the position of the video data object in the first buffer space is determined as the identification of the video data object, the video data object is buffered by multiplexing the position in the first buffer space, and can be distinguished by the identification of the video data object.

In one possible implementation, the method further includes: after the preset second number of audio data objects are acquired, if the number of the remaining positions in the second buffer space is greater than or equal to the preset second number, writing the acquired audio data objects after the last audio data object in the second buffer space; if the number of the residual positions in the second buffer space is smaller than the preset second number, writing the acquired partial audio data objects from the last position to the last position after the last audio data object in the second buffer space, and starting writing the residual audio data objects from the first position in the second buffer space; the position of the audio data object in the second buffer space is determined as the identification of the audio data object, the audio data object is buffered by multiplexing the position in the second buffer space, and the identification of the audio data object can be distinguished.

In a second aspect, the present application provides an electronic device, including: one or more processors; one or more memories; the memory stores one or more programs that, when executed by the processor, cause the electronic device to perform the video acquisition method described above.

In a third aspect, the present application provides a computer-readable storage medium in which a computer program is stored which, when executed by a processor, causes the processor to perform the above-described video acquisition method.

Drawings

FIG. 1 is a hardware block diagram of an electronic device provided by the present application;

fig. 2 is a software architecture diagram of an electronic device according to the present application;

FIG. 3 is a schematic diagram of a video acquisition method according to the present application;

FIG. 4 is a flow chart of a video acquisition method provided by the application;

FIG. 5 is a schematic diagram of writing of an audio data object according to the present application;

FIG. 6 is a schematic diagram of another writing of audio data objects provided by the present application;

FIG. 7 is a schematic diagram of determining a start identifier according to the present application;

FIG. 8 is another schematic diagram of determining a start identifier according to the present application;

fig. 9 is a schematic diagram of processing a special audio object and a special video object according to the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application. The terminology used in the following examples is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in the specification of the application and the appended claims, the singular forms "a," "an," "the," and "the" are intended to include, for example, "one or more" such forms of expression, unless the context clearly indicates to the contrary. It should also be understood that in embodiments of the present application, "one or more" means one, two, or more than two; "and/or", describes an association relationship of the association object, indicating that three relationships may exist; for example, a and/or B may represent: a alone, a and B together, and B alone, wherein A, B may be singular or plural. The character "/" generally indicates that the context-dependent object is an "or" relationship.

Reference in the specification to "one embodiment" or "some embodiments" or the like means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," and the like in the specification are not necessarily all referring to the same embodiment, but mean "one or more but not all embodiments" unless expressly specified otherwise. The terms "comprising," "including," "having," and variations thereof mean "including but not limited to," unless expressly specified otherwise.

The plurality of the embodiments of the present application is greater than or equal to two. It should be noted that, in the description of the embodiments of the present application, the terms "first," "second," and the like are used for distinguishing between the descriptions and not necessarily for indicating or implying a relative importance, or alternatively, for indicating or implying a sequential order.

The electronic device may record video data and audio data while running a video APP or a live APP using the electronic device. If the electronic device is monitoring a key event (such as a wonderful moment), the electronic device can extract video data and audio data of the key event through the callback service so as to obtain a video file of the key event. Such as when the electronic device is running a video APP, the user views the highlight and can click a record button (record button is one example) in the trigger video APP page. After the electronic equipment monitors the click operation of the recording button, the video file with preset duration is recorded by calling the recording service, and one mode is that the electronic equipment can extract video data and audio data with preset duration from recorded video data and audio data, the extracted video data and audio data are synthesized into a video file, and the playing duration of the video file is the preset duration. The electronic equipment can reduce or avoid time delay between the time stamp of the video data and the time stamp of the audio data, and the aim of audio-video synchronization is achieved through the time stamp. In the method for implementing audio-video synchronization according to the time stamp, the electronic device records the time stamp of the video data when recording one piece of video data, and also records the time stamp of the audio data when recording one piece of audio data.

Before extracting video data and audio data, the electronic device may determine a start time of video data extraction, compare the start time of video data with a time stamp of each piece of audio data to determine a start time of audio data, extract video data according to the start time of video data, and extract audio data according to the start time of audio data. In this way, the electronic device takes a long time to determine the start time of the audio data, and video acquisition efficiency is reduced.

The application provides a video acquisition method, which can search the start identifier of video data from the last video data object of played video data (i.e. recorded video data), wherein the start identifier of the video data points to the start time of video data extraction, the interval between the start identifier of the video data extraction and the identifier of the last video data object is determined, the start identifier of audio data is searched by utilizing the interval and the identifier of the last audio data object of the played audio data, and the start identifier of the audio data points to the start time of audio data extraction. Compared with a time stamp comparison mode, the electronic device can directly utilize the interval and the identification of the last audio data object of the played audio data through the video acquisition method, so that the starting time of audio data extraction can be quickly searched, and the video acquisition efficiency is improved.

If the electronic device plays the video, the electronic device may implement a video acquisition method to record the video back to obtain a video file in which the video data and the audio data can be synchronized with each other. For example, the electronic device may implement the video acquisition method when the electronic device runs an APP having a video frame (picture) output and a sound output, such as a video APP, a live APP, a game APP, or the like.

In some examples, the video acquisition method may fix the number of audio data objects and the number of video data objects recorded (buffered) per unit time (e.g., per second), each audio data object and each video data object having a unique identification, e.g., each audio data object is buffered in an audio loop array, each video data object is buffered in a video loop array, each audio data object and each video data object has a unique array index, and the position of each data object in the array is identified uniquely by the array index as the position of each data object in the array. In determining the start time of the video data and the audio data, the start identifier of the video data and the start identifier of the audio data are quickly located according to the positions of the data objects in the array, so that the start time of the video data and the start time of the audio data are identical or are close to each other, for example, the difference between the two start times is within a preset range (for example, less than 120 ms).

The video capture method may be applied to the electronic device shown in fig. 1, which may be, in some embodiments, a smart screen, a cell phone, a tablet, a desktop, a laptop, a notebook, an ultra-mobile personal computer (UMPC), a handheld computer, a netbook, a personal digital assistant (personal digital assistant, PDA), a wearable electronic device, a projector, etc. The specific form of the electronic device is not particularly limited in the present application.

As shown in fig. 1, the electronic device may include: processor, external memory interface, internal memory, universal serial bus (universal serialbus, USB) interface, charge management module, power management module, battery, antenna 1, antenna 2, mobile communication module, wireless communication module, sensor module, keys, motor, indicator, audio module, camera, display, and subscriber identity module (subscriber identification module, SIM) card interface, etc. Wherein the audio module may include a speaker, a receiver, a microphone, an earphone interface, etc., and the sensor module may include a pressure sensor, a gyro sensor, a barometric sensor, a magnetic sensor, an acceleration sensor, a distance sensor, a proximity sensor, a fingerprint sensor, a temperature sensor, a touch sensor, an ambient light sensor, a bone conduction sensor, etc.

It is to be understood that the configuration illustrated in this embodiment does not constitute a specific limitation on the electronic apparatus. In other embodiments, the electronic device may include more or fewer components than shown, or certain components may be combined, or certain components may be split, or different arrangements of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

The processor may include one or more processing units, such as: the processors may include application processors (application processor, AP), modem processors, graphics processors (graphics processing unit, GPU), image signal processors (image signal processor, ISP), controllers, video codecs, digital signal processors (digital signal processor, DSP), baseband processors, and/or neural network processors (neural-network processing unit, NPU), etc. Wherein the different processing units may be separate devices or may be integrated in one or more processors. The processor is a nerve center and a command center of the electronic equipment, and the controller can generate operation control signals according to instruction operation codes and time sequence signals to finish instruction fetching and instruction execution control.

The external memory interface may be used to connect an external memory card, such as a Micro SD card, to enable expansion of the memory capabilities of the electronic device. The external memory card communicates with the processor through an external memory interface to realize the data storage function. For example, video data, audio data, and the like are stored in an external memory card. The internal memory may be used to store computer-executable program code that includes instructions. The processor executes the instructions stored in the internal memory to perform various functional applications of the electronic device and data processing. For example, in the present application, the processor causes the electronic device to execute the video acquisition method provided by the present application by executing the instructions stored in the internal memory.

The display screen may be used to display images or video. The display screen includes a display panel. The display panel may employ a liquid crystal display (liquid crystal display, LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode (AMOLED), a flexible light-emitting diode (FLED), a Mini light-emitting diode (Mini LED), a Micro light-emitting diode (Micro LED), a Micro OLED (Micro OLED), or a quantum dot LED (quantum dot light emitting diodes, QLED). In some embodiments, the electronic device may include 1 or N displays, N being a positive integer greater than 1.

Speakers, also known as "horns," are used to convert audio electrical signals into sound signals. The electronic device may broadcast audio data through a speaker, etc., such as when the electronic device is playing video, the audio data is played through the speaker to restore sound in the video.

In addition, an operating system is run on the components. Such as the iOS operating system developed by apple corporation, the Android open source operating system developed by google corporation, the Windows operating system developed by microsoft corporation, etc. An operating application may be installed on the operating system.

The operating system of the electronic device may employ a layered architecture, an event driven architecture, a microkernel architecture, a microservice architecture, or a cloud architecture. In the embodiment of the application, an Android system with a layered architecture is taken as an example, and the software and hardware structures of the electronic equipment are illustrated. Fig. 2 is a software architecture block diagram of an electronic device. The software structure adopts a layered architecture, the layered architecture divides the software into a plurality of layers, and each layer has clear roles and division work. The layers communicate with each other through a software interface. Taking an Android system as an example, in some embodiments, the Android system is divided into four layers, namely an application layer, an application Framework layer (Framework), a hardware abstraction layer (hardware abstraction layer, HAL) layer, a kernel layer and a hardware layer from top to bottom.

The application layer may include a series of application packages, among other things. The application package can include video APP, live APP and other APPs, the application layer can further include some service APPs, the service APPs can provide services for APPs corresponding to the application package, when APP corresponding to the application package operates, the service APPs providing services for the APPs can operate, for example, an application program can include television service APP (TvServiceTv) and a recording screen APP (WonderfulTv), tvServiceTv can be used in video playing processes of the video APP, live APP and the like, wonderfulTv can be used in recording and recording, the recording indicates that a video has a wonderful moment and the like, and the recording can be triggered by a user or automatically triggered after the electronic device automatically monitors the wonderful moment.

The application framework layer provides an application programming interface (application programming interface, API) and programming framework for application layer applications. The application framework layer includes a number of predefined functions. For example, the application framework layer may include an audio policy executor (audioplayer), a media player (MediaPlayer), and a window system (SurfaceFlinger). AudioFlinger is used for outputting audio data, mediaPlayer is used for outputting video data, and surfeflink is used for outputting interface data, such as output progress bar data, which represents the playing progress of video.

The HAL layer & kernel layer may provide a call interface for an upper layer and receive data sent by the upper layer, for example, the HAL layer & kernel layer includes an Audio Output (AO) module, a Video Output (VO) module, an on-screen menu (OSD) module, and a MIX (MIX) module. The AudioFlinger outputs the audio data to the AO module, and the audio data in the AO module can be played through a loudspeaker; the MediaPlayer outputs the video data to the VO module, and the video data in the VO module can be displayed through a display screen; the interface data is output to the OSD module by the SurfaceFlinger, and the interface data in the OSD module may be displayed through a display screen to display a play progress or the like when the display screen displays a picture (an image restored from video data). The MIX module may interact with the VO module and the OSD module to MIX video data with interface data, for example, the video data may restore an image (picture), the interface data may restore an interface, and the mixing of the video data and the interface data may be adding the interface to the image, that is, obtaining an image with the interface may be regarded as adding the interface data to the video data.

The HAL layer & kernel layer may also include some functions or interfaces, e.g. the HAL layer & kernel layer may include an audio capture interface (AudioCapture) that may capture data from the AO module and a screen capture interface (screen capture) that may capture data from the MIX module.

The driver layer may provide a device driver, for example, the driver layer includes a high definition multimedia interface (High Definition Multimedia Interface, HDMI), a Digital TV (DTV), an Audio and Video (AV), and the like.

The interaction flow between each layer in the software structure is shown in fig. 3, when the electronic device runs the video APP to play video, the video APP can send video playing signals to AudioFlinger, mediaPlayer and SurfaceFlinger, and AudioFlinger, mediaPlayer and SurfaceFlinger respectively respond to the video playing signals. The AudioFlinger transmits audio data of the video to the AO module, which may store the audio data. The MediaPlayer transmits video data (which may also be referred to as image data or picture data) of a video to the VO module, which may store the video data. The SurfaceFlinger sends interface data (such as progress bar data) of the video to the OSD module, which may save the interface data. When the electronic equipment runs the live APP and is in live broadcast, the live APP can send live broadcast signals to HDMI/DTV/AV of the driving layer so as to decode audio data to the AO module and decode video data to the VO module.

And when the video APP and the live APP play videos, sending a display signal through the VO module, and when the video APP and the live APP play videos, sending a display end signal through the VO module. TvServiceTv can monitor the send-display signal and send-display ending signal, if monitor send-display signal, tvServiceTv sends send-display notice to Wondersultv, wondersultv responds to send-display notice, run Wondersultv's screen recording service (Screen recording service). The Screen recordservice starts to record the audio and video, the screen recordservice can call a TvServiceTv packaged interface to acquire the audio and video, for example, audio data can be acquired from an AO module through TvServiceTv calling audioCapture and ScreenCapture, audioCapture, the screen Capture can acquire mixed data from a MIX module, the mixed data is at least obtained based on video data in a VO module, and if corresponding interface data exists in an OSD module, the mixed data can be obtained based on the video data and the interface data.

During the playing process, the ScreenRecordService can always call AudioCapture and ScreenCapture through the TvServiceTv, and always record audio and video. The user makes a recording operation at a highlight moment, the WonderfullTv can monitor a recording signal, call a highlight instant recording interface (recordManager) of a recording manager (recordManager) in response to the recording signal, write video data and audio data for recording into an audio-video synthesizer (MediaMuxer), and synthesize the video data and the audio data by the MediaMuxer to obtain a video file, wherein the video file comprises the video data and the audio data, the starting time of the video data and the audio data for recording is the same or close to each other, the recordManager can be a class of video APP and live APP in an application layer, the recordManager has an interface recordMoment, and the MediaMuxer can be a standard tool for synthesizing the file in the application layer.

Referring to fig. 4, fig. 4 shows a process of starting recording an audio and video to a MediaMuxer synthesized video file from a ScreenRecordService, and fig. 4 shows a process of a video acquisition method by taking a smart screen playing video as an example, which may include the following steps:

step S101, the intelligent screen plays video, and the intelligent screen starts the Screen Reccord service.

Step S102, the Screen service calls a TvServiceTv packaged interface to obtain the audio and video, wherein the TvServiceTv is packaged with a jni interface, such as an interface audioCapture for obtaining audio data and an interface Screen record for obtaining video data.

TvServiceTv can initialize a video recording service, e.g., tvServiceTv can initialize the number of video data objects recorded per unit time, frame rate, etc., e.g., tvServiceTv initializes the number of video frames acquired per second (i.e., the number of video data objects) recorded per second, and thus the frame rate after the number of video frames is initialized can be determined according to the number of video frames. The number of audio data objects and the duration of each audio data may be set in advance, so that the duration of each audio data is fixed, and of course, the audio data may be initialized by TvServiceTv, for example, recording a pieces of audio data per second. The values of a and S are greater than 1, and the value of a may be greater than the value of S, for example, a=15, s=30, which is not limited in this embodiment.

Step S103, using audioCapture to acquire A audio data every second, wherein the A audio data form an audio data array. After the AudioCapture completes one second of audio data acquisition, the audio data array is stored in an audio buffer (buffer), the audio data buffered by the audio buffer is called an audio data object (AudioFrame), and the audio data in the audio data array and the audio buffer can be ordered according to the acquisition sequence, for example, the earlier acquired audio data is ordered earlier in the audio data array. Because the audio data are ordered according to the acquisition sequence, the acquisition sequence can also represent the sequence of the audio data when playing the video, so the position relationship among the audio data is the same as the position relationship when playing the video according to the acquisition sequence, the position relationship represents the playing sequence relationship (acquisition sequence or acquisition sequence) of the audio data, the earlier the played audio data are ordered in the audio data array more forward.

AudioCapture can acquire audio data based on the time length of audio data set in advance, so that the time length of a pieces of acquired audio data is the same/identical every second. For example, 10 pieces of audio data are acquired every second, the duration of each piece of audio data is 100ms, the audiocapture can acquire one piece of audio data every 100 milliseconds (ms), the audio data are split while being acquired within one second, and one piece of audio data is acquired every certain time (such as 100 ms).

In some examples, tvServiceTv may initialize the number of audio data objects, and AudioCapture may split a piece of audio data acquired after one second of audio data acquisition to obtain a number of audio data equal to the number of audio data objects. If AudioCapture collects a section of audio data with a duration of 640ms in one second, the number of audio data objects is 10, the AudioCapture can divide the audio data with the duration of 640ms into 10 pieces of audio data, and the duration of each audio data in the 10 pieces of audio data is 64ms. The equipartition is an example, or may be uneven, and will not be described here.

Step S104, the audio data objects in the audio buffer are written into an audio circulation array for storage, the ordering of the audio data objects is unchanged when the audio data objects are written, and the currently written audio data objects are ordered according to the acquisition sequence after the previously written audio data objects, so that the audio data objects in the audio circulation array are ordered according to the acquisition sequence and the ordering of the audio data is the same as that when the video is played, and therefore the audio data can be extracted according to the ordering in the audio circulation array when the audio data is extracted, and the accuracy of the extracted audio data is ensured.

For example, 15 audio data are acquired per second, the 15 audio data may be ordered according to an acquisition order, as shown in fig. 5, the first acquired audio data may be ordered at position 1, the second acquired audio data may be ordered at position 2, … …, and so on, the last acquired audio data in one second may be ordered at position 15, the 15 audio data form an audio data array stored in an audio buffer, and the audio data objects in the audio buffer are sequentially written into the audio cycle array according to the order. In fig. 5, the audio cycle array has been written to the position 30, then the audio data objects in the audio buffer ordered from 1 to 15 are written to the positions 31 to 45, the array index of the audio data can be obtained based on the position of the audio data in the audio cycle array, if the position of the audio data is 31, the array index is 30, the audio cycle array [30] refers to the audio data in the position 31, and the array index of the audio data object in the audio cycle array is the unique identification of the audio data object. When the audio data object is written into the audio circulation array from the audio buffer, the AudioCapture can continuously acquire audio data, and after 15 audio data are acquired, 15 audio data acquired again can continuously be sequentially written into the audio buffer according to the sequence.

The audio loop array may be set to a maximum position, and the audio loop array may multiplex the positions if the remaining positions in the audio loop array do not meet the audio data object writing requirements. The audio data object writing requirement is not satisfied, which may be that the remaining position in the audio loop array is smaller than the total number of audio data objects in the audio buffer. For example, as shown in fig. 6, the audio cycle array holds at most 1000 audio data objects, and is currently written to the location 990, and the remaining 10 locations of the audio cycle array write the audio data objects from the location 1 to the location 10 in the audio buffer are written to the location 991 to the location 1000, and the audio data objects from the location 11 to the location 15 multiplex the first 5 locations of the audio cycle array, but the array subscripts of these 5 audio data objects are 1001 to 1005, and the locations may be adjusted to 1001 to 1005 to indicate that the locations of the audio cycle array are multiplexed, and these 5 audio data objects are acquired later, in which case the locations of the audio data objects in the audio cycle array may also be used as unique identifiers. However, if the first 5 positions still adopt positions 1 to 5, the unique identification of the positions as the audio data objects may cause data reading errors, in which case the acquisition time of the positions and the audio data objects is taken as the identification of the audio data objects, the acquisition time being for distinguishing the acquisition order of the audio data objects, so that the search direction of the start identification of the audio data is determined by the acquisition time when multiplexing the positions.

In some examples, after the audio data objects in the audio buffer are written to the audio loop array, the audio buffer is emptied in preparation for saving the audio data objects again. A point needs to be described here: if the total duration of the audio data in one second is less than one second, the audio capture may make some audio data empty in a manner of obtaining one audio data every certain time interval, but the acquisition sequence of the audio data is determined based on the sequence of the audio data in one second, the audio data in the audio buffer may be ordered according to the playing time points in one second, after the audio data are written into the audio circulation array according to the acquisition sequence, each audio data in the audio circulation array is actually stored according to the actual playing time point of each audio data, and the actual playing time point is the time point of the audio data in video playing. Although the audio cycle array does not have a time to record each audio data, the audio cycle array may be stored at an actual play time point of each audio data. When extracting a plurality of audio data from the audio cycle array, the plurality of audio data can restore a section of audio according to the order in the audio cycle array, and the restored audio is the same as the audio when the video is played, thereby improving the accuracy.

In step S105, S video frames are acquired every second by using a ScreenRecord, and the S video frames form a video data array, where the video data array is stored in a video buffer, and the video frames buffered by the video buffer are called as video frame objects (screenframes), and may also be called as video data objects. If the video frame of the video frame object is an I-frame, the video frame object also includes an I-frame identification.

And if the Screen record determines that the video frame is an I frame, adding an I frame identification to the video frame. For example, if the wire is an I frame identifier, if the video frame is an I frame, adding true for the video frame; if the video frame is a P frame or a B frame, a false is added to the video frame, and the false indicates that the video frame is not an I frame. The adding of the I-frame identifier may be when the ScreenRecord video data array is written into the video buffer (or before writing), or may be after the ScreenRecord collects a video frame, determining whether the video frame is an I-frame, so that the video frame in the video data array has already added the I-frame identifier.

The video frames in the video data array may be ordered according to the acquisition sequence, for example, the earlier the acquired video frames are ordered in the video data array, the video frame objects are ordered in the video buffer according to the ordering of the video frames in the video data array in the video frame objects, so that the ordering of the video frames in the video data array is the same as the ordering of the video frames in the video buffer. Because the video frames are ordered according to the acquisition sequence, the acquisition sequence can also represent the sequence of the video frames when the video is played, so that the position relationship between the video frames is the same as the position relationship when the video is played according to the acquisition sequence, the position relationship represents the play sequence relationship (acquisition sequence or acquisition sequence) of the video frames, the display sequence of the video frames in the display screen can be represented, and the earlier displayed video frames are more front in the sequence of the video data array.

In some examples, tvServiceTv may initialize the number of video frame objects and the frame rate, screenRecord may collect video frames at the frame rate, e.g., 30 video frames per second at the frame rate, and the value of S is not described here.

Step S106, the video frame objects in the video buffer are written into a video circulation array for storage, the sequence of the video frame objects is unchanged when the video frame objects are written, and the video frame objects which are written in currently are arranged behind the video frame objects which are written in previously, according to the acquisition sequence, so as to ensure that the video frame objects in the video circulation array are sequenced according to the acquisition sequence and the sequence of the video frames is the same as that when the video is played, thereby being capable of being extracted according to the sequence in the video circulation array when the video frames are extracted, and ensuring the accuracy of the extracted video frames.

The writing of video frame objects to the video loop array is similar to the writing of audio data objects to the audio loop array, see the example of fig. 5, which is not described in detail herein. In some examples, after the video frame objects in the video buffer are written to the video loop array, the video buffer is emptied in preparation for saving the video frame objects again. The array index of the video frame object in the video loop array can be used as the unique identification of the video frame object. The video loop array may set a maximum position, and if the remaining positions in the video loop array do not meet the video frame object writing requirement, the video loop array may multiplex the positions, please refer to the example of fig. 6 specifically, which will not be described in detail herein.

S107, when the wonderfull Tv monitors that the user triggers the highlight, a highlight instant recording interface (recordmanagement) of a recording manager (RecordManager) is called to determine a starting identification of video data, and a time interval gap between a video frame corresponding to the starting identification and a last video frame is determined. The wonderfull tv monitors the user trigger highlight, which indicates that the user triggers the recording service, in which case the wonderfull tv may call recordmanagement of the RecordManager to extract video data and audio data.

Before extracting video data and audio data, recordmovement needs to determine the start identification of the video data. As shown in fig. 4, when the user trigger highlight moment is monitored, the current writing position of the video loop array is SN, that is, the position of the last video frame currently written into the video loop array in the video loop array is SN, the recordmovement searches the first video frame object with the I frame identifier from the s×t position before the position SN, and the position of the video needle object in the video loop array is used as the start identifier of the video data.

T is the recording duration, S is the number of video frame objects acquired per second, T is the number of video frame objects acquired within the duration T, the S.T-th position before the position SN is assumed to be the position SI, the interval duration between the two positions is T from the position SI to the position SN, and thus the recordmovement uses the position SI as a starting position to search the first video frame object with the I frame identifier backwards.

The video frame in the first video frame object with the I frame identification is an I frame, the I frame can be decompressed into a picture through a video decompression algorithm, no P frame or B frame is seen in the I frame decompression process, and the I frame completely reserves picture information, so that the position of the first video frame object with the I frame identification in a video cycle array is used as the initial identification of video data, which is equivalent to the fact that the first extracted video frame is the I frame, the first picture can be accurately restored, and the P frame or B frame after the I frame can be decompressed by referring to the I frame when being decompressed, and the accuracy is improved.

S108, extracting a video frame object from the starting identifier to the position SN. After the first video frame object with the I frame identifier is at the position SI, the duration occupied by the extracted video frame object is less than or equal to T, and the requirement that a user starts recording from the position SN is met.

In some examples, the recordmovement may use the s×t position before the position SN as the start identifier of the video data, and the duration occupied by the extracted video frame object is equal to T, which also satisfies the requirement that the user starts recording from the position SN, but the first extracted video frame may be a B frame or a P frame, where the B frame or the P frame needs to be decompressed with reference to an I frame, and because the B frame or the P frame may be the first video frame, no I frame refers to the first video frame, so that the probability of decompression error of the B frame or the P frame is increased, and the accuracy is reduced.

In some examples, recordmovement may look forward for the first video frame with an I-frame identification from an stth position before position SN. Assuming that recordmovement finds the first video frame with the I-frame identifier at the position SH onwards, the duration taken from the position SH to the position SN is longer than T. If the position SH is taken as the start identifier of the video data, in order to meet the requirement that the recording duration is T, recordmovement is terminated when the position SZ is extracted, the duration occupied from the position SH to the position SZ is equal to T, the video frame object from the position SZ to the position SN is not extracted, and the requirement that the user starts recording from the position SN cannot be met.

S109, determining the starting identification of the audio data based on the current writing position AN of the audio cycle array, the starting identification of the video data and the interval gap between the position SN. The gap is the number of video frame objects from the start identifier of the video data to the position SN, and the recordmovement can use the a (gap/S) position before the position AN as the start identifier of the audio data, where (gap/S) is the duration occupied by the extracted video frame object.

S110, extracting AN audio data object from the initial identification of the audio data to the position AN by the recordmovement.

The video frame object and the audio data object extracted by recordmovement are written into MediaMuxer, the MediaMuxer synthesizes a dynamic image expert group 4 (Moving Picture Experts Group, MP 4) file based on the video object and the audio data object, the MP4 file is a video file with a file format of MP4, the MP4 file can be stored in a memory of the electronic device, and the MP4 file is a video file of the sub video of the video played in step S101.

In the following, in connection with examples, it is described how to determine a start identification of video data and a start identification of audio data, in examples a=15, s=30, t=15 s. Fig. 7 shows an example of determining the start identifier of video data and the start identifier of audio data, in fig. 7, when recordmovement monitors that the user triggers a highlight, the current writing position of the video loop array is 500, s×t=450, and recordmovement can search for the first video frame object with I-frame identifier from 450 th position before the position 500, i.e. from the position 50 to the position 500. Assuming that recordmovement finds the first video frame object with an I-frame identifier (denoted by I-frame identifier in fig. 7) at position 60, recordmovement extracts the video frame object from position 60 to position 500, position 60 being the start identifier of the video data and position 500 being the end identifier of the extracted video data.

The interval gap between positions 60 and 500=440, a (gap/S) =15 (440/30) =220. The current writing position of the audio cycle array is 300, the audio data object is extracted from the 220 th position before the position 300, namely, the audio data object is extracted from the position 80 to the position 300, the position 80 is the starting identification of the audio data extraction, and the position 300 is the ending identification of the audio data extraction. The interval from position 80 to position 300 is 220, 220/15=440/30, and the extracted video data and audio data have the same duration. The positions SN and AN correspond to the same time, meaning that the end time of the video data and the audio data are the same, and the duration of the video data and the duration of the audio data are the same, so that the start time of the video data and the start time of the audio data are the same, meaning that the start time corresponding to the start identifier of the video data and the start time corresponding to the start identifier of the audio data are the same time point, the video data and the audio data with the same duration are extracted from the same start time, and the audio data and the video data in the video file can be synchronized in AN audio-video manner.

Fig. 8 shows another example of determining a start identification of video data and a start identification of audio data, fig. 8 is directed to a position multiplexing scenario, in which positions 1 to 10 of a video loop array are multiplexed in fig. 8, and the current writing position of the video loop array is 1010; positions 1 through 5 of the audio loop array are multiplexed, and the current writing position of the audio loop array is 1005.

Because s×t=450, then the 450 th position before position 1010 in the video loop array is position 660, recordmovement starts from position 660, and the first video frame object with I-frame identification is found between position 660 and position 1010. Assuming that recordmovement starts from position 680 to the first video frame object with an I-frame identification, recordmovement extracts the video frame object from position 680 to position 1010. The interval gap between positions 680 to 1010=330, a (gap/S) =15 (330/30) =165. The current writing position of the audio loop array is 1005, and the audio data object is extracted from the 165 th position before the position 1005, that is, from the position 840 to the position 1005.

Based on the example of fig. 7, the duration is 14.67s, the duration is less than 15s; based on the example of fig. 8, the length of time extracted by recordmovement is 11s, the length of time is less than 15s, and the lengths of time extracted by the two times are different. Of course, the recordmovement may also determine the start identifier in other manners, taking the video loop array shown in fig. 7 as an example, the recordmovement may use the 450 th position before the position 500 as the start identifier of the video data, and extract the video frame object from the position 50 to the position 500, where the extracted duration is (500-50)/30=15 s, and the set duration T is the same.

If a (gap/S) is not an integer, the recordmovement may be rounded, rounded downwards or rounded upwards to obtain a calculation result of a (gap/S), for example, a (gap/S) =17.8, which indicates that recordmovement starts to extract the audio data object from the 17.8 th position before the current writing position, but 17.8 is not an integer, and thus recordmovement processes a (gap/S) in a rounded way, starts to extract the audio data object from the 18 th position before the current writing position, and delays the start time of extracting the audio data object slightly backward with respect to the start time of the video data object, so that the start time of the audio data is close to the start time of the video data.

In some examples, if the number of audio data acquired per second of AudioCapture is greater than a, the audio data acquired after the a-th audio data is discarded (i.e., deleted); if the number of audio data acquired per second of AudioCapture is smaller than A, filling is performed by using special audio objects to meet the requirement of acquiring A pieces of audio data per second. The special audio object may be a blank audio data or a content-fixed audio data, and is identified with a special mark, so that the special mark can be distinguished when extracting the audio data.

If the number of video frames acquired per second by the ScreenRecord is greater than S, discarding (i.e., deleting) video frames acquired after the S-th video frame; if the number of video frames acquired per second by the ScreenRecord is less than S, filling is performed by using a special video object to meet the requirement of acquiring S video frames per second. The special video object may be an empty video frame or a video frame with a fixed content, and is identified by a special mark, so that the video frames can be distinguished by the special mark when they are extracted.

For a scene where a special audio object and a special video object exist, when the video frame and the audio data object are extracted by the recordmovement, if the audio data object to be extracted is the special audio object, the recordmovement can skip the special audio object (i.e. not extract to disable the special audio object), and can also extract but disable the special audio object; if the video data object to be extracted is a special video object, the special video object may be skipped by recordmovement, or the special video object may be adjusted based on a plurality of video frames adjacent to the special video object.

In some examples, the recordmovement may adjust the special video object based on a plurality of video frames before and after the special video object, for example, using 5 video frames before the special video object and 5 video frames after the special video object, using image data of 10 video frames to obtain image data of the feature video object, where the image data of the special video object may obtain one video frame. If the average value of the pixel values of the same pixel in 10 video frames is calculated, the average value is the pixel value of the corresponding pixel in the special video object, and after the average value calculation processing of all the pixels is completed, the image data of the special video object is obtained.

In some examples, recordmovement may adjust the special video object based on a number of video frames preceding the special video object; in some examples, recordmovement may adjust the special video object based on a number of video frames following the special video object; the process may be referred to above and will not be described in detail here.

If the recordmovement extracts the special video object when the special video object is adjusted and reads the special video object when the video frame is read from the front and the back of the special video object, the recordmovement skips the special video object, or of course, the special video object read when the special video object is adjusted first can be adjusted, and then the special video object to be processed is adjusted by utilizing the video frame and the adjusted special video object. The recordmovement can determine the processing mode of the special video object in the video loop array according to the current power consumption, resource consumption and the like, for example, if the CPU is idle, the recordmovement can adjust the special video object.

One example of recordmovement processing a special audio object and a special video object is shown in fig. 9, for which recordmovement may skip the special audio object; for a special video object, recordmovement adjusts the special video object using 5 video frames before the special video object and 5 video frames after the special video object.

In addition, the present application provides an electronic apparatus including: one or more processors; one or more memories; the memory stores one or more programs that, when executed by the processor, cause the electronic device to perform the video acquisition method described above.

The present application provides a computer-readable storage medium in which a computer program is stored which, when executed by a processor, causes the processor to perform the video acquisition method described above.

Claims

1. A method of video acquisition, the method comprising:

playing the first video;

receiving a recording signal aiming at the first video in the process of playing the first video, wherein the recording signal is used for indicating to utilize played video data and played audio data of the first video to generate a second video of the first video;

responding to the recording signal, and searching a target video data object with the interval of the last video data object in the played video data as the recording duration from the played video data;

determining a first starting identifier at least according to the identifier of the target video data object;

Taking a video data object in the played video data pointed by the first starting identifier as a first frame, and starting from the first frame, acquiring video data from the played video data;

determining an interval between the first start identifier and the identifier of the last video data object;

according to the interval, determining an audio data interval, searching an audio data object which is spaced from the last audio data object of the played audio data by the audio data interval in the played audio data, wherein the identifier of the searched audio data object is used as a second initial identifier, and the time length corresponding to the interval is equal to the time length corresponding to the audio data interval;

taking an audio data object in the played audio data pointed by the second initial identifier as a first section of audio, and acquiring audio data from the played audio data from the first section of audio;

generating the second video, the second video including the video data and the audio data;

wherein said determining a first starting identification based at least on the identification of said target video data object comprises: if the target video data object is an I frame, determining the identification of the target video data object as the first starting identification; if the target video data object is a B frame or a P frame, searching a first video data object with an I frame identification corresponding to the target video data object from the played video data at least according to the target video data object, and determining the identification of the first video data object with the I frame identification as the first starting identification.

2. The method of claim 1, wherein if the target video data object is a B-frame or a P-frame, searching the played video data for the first video data object with the I-frame identifier corresponding to the target video data object based at least on the target video data object comprises:

and if the target video data object is a B frame or a P frame, starting to search the first video data object with the I frame identification corresponding to the target video data from the target video data object to the last video data object.

3. The method of claim 2, wherein the video data object in the played video data pointed to by the first start identifier is a first frame, and wherein starting from the first frame, obtaining video data from the played video data comprises:

obtaining a video data object from the played video data from the beginning of the first frame to the ending of the last video data object; and combining the acquired video data objects to obtain video data, wherein the duration of the video data is less than or equal to the recording duration.

4. The method of claim 1, wherein if the target video data object is a B-frame or a P-frame, searching the played video data for the first video data object with the I-frame identifier corresponding to the target video data based at least on the target video data object comprises:

If the target video data object is a B frame or a P frame, starting from the target video data object to the first video data object in the played video data, searching the first video data object with the I frame identification corresponding to the target video data object.

5. The method of claim 4, wherein the video data object in the played video data pointed to by the first start identifier is a first frame, and wherein starting from the first frame, obtaining video data from the played video data comprises: starting from the first frame, obtaining a video data object from the played video data until the video data object with the first frame interval recording time is ended, and combining the obtained video data objects to obtain video data;

or alternatively

The step of taking the video data object in the played video data pointed by the first start identifier as a first frame, and starting from the first frame, obtaining video data from the played video data includes: obtaining a video data object from the played video data from the beginning of the first frame to the ending of the last video data object; and combining the acquired video data objects to obtain video data, wherein the duration of the video data is longer than the recording duration.

6. The method according to claim 3 or 5, characterized in that the method further comprises: if the obtained video data object is a special video object, disabling the special video object;

or alternatively

And if the obtained video data object is a special video object, adjusting the special video object according to the video data object related to the special video object in the played video data, wherein the adjusted special video object is used for obtaining the video data.

7. The method of claim 6, wherein adjusting the special video object based on the video data object associated with the special video object in the played video data comprises:

and adjusting the special video object according to N video data objects positioned before the special video object and/or M video data objects positioned after the special video object in the played video data, wherein N and M are natural numbers which are greater than or equal to 1.

8. The method according to any one of claims 1 to 5, wherein the audio data object in the played audio data pointed to by the second start identifier is a first piece of audio, and wherein starting from the first piece of audio, obtaining audio data from the played audio data includes:

Acquiring an audio data object from the played audio data from the beginning of the first audio segment to the end of the last audio data object; the acquired audio data objects are combined to obtain audio data.

9. The method of claim 8, wherein the method further comprises: and if the acquired audio data object is a special audio object, disabling the special audio object.

10. The method according to any one of claims 1 to 5, wherein the played video data is cached in a first cache space, and the position of the last video data object in the first cache space is a first position, and the first position is an identification of the last video data object;

and responding to the recording signal, searching the target video data object which has the distance of the recording duration with the last video data object in the played video data from the played video data, wherein the searching comprises the following steps: determining the video data object which is positioned at the S & ltT & gt position in front of the first position in the first cache space and belongs to the first video as the target video data object, wherein S is the number of the video data objects cached in unit time, and T is the recording time length;

Said determining a first starting identification based at least on the identification of said target video data object comprises: starting from the target video data object, searching a first video data object with an I frame identifier belonging to the first video in the first cache space, and acquiring the position of the first video data object with the I frame identifier in the first cache space; and determining the position of the first video data object with the I frame identifier in a first buffer space as the first starting identifier.

11. The method of claim 10, wherein the played audio data is cached in a second cache space, the location of the last audio data object in the second cache space being a second location, the second location being an identification of the last audio data object;

the determining the interval between the first start identification and the identification of the last video data object comprises: determining a position difference between a position corresponding to the first start identifier and the first position as the interval;

and determining an audio data interval according to the interval, searching an audio data object which is spaced from the last audio data object of the played audio data by the audio data interval in the played audio data, wherein the searched identification of the audio data object as a second initial identification comprises: determining the total number B of the audio data objects according to A (gap/S), wherein the total number B of the audio data objects is the audio data interval, A is the number of the audio data objects cached in unit time, and gap is the interval; and searching an audio data object which belongs to the first video and is positioned at a B position before the second position in the second cache space, and determining the position of the searched audio data object in the second cache space as the second starting identifier.

12. The method of any one of claims 1 to 5, wherein prior to said responding to said return signal, said method further comprises: acquiring a video data object and an audio data object at each unit time;

if the number of the video data objects acquired in the unit time is larger than a preset first number, deleting part of the video data objects, wherein the number of the rest video data objects is the preset first number, and the preset first number is the number of the video data objects which can be cached in the unit time;

if the number of the video data objects acquired in unit time is smaller than the preset first number, adding special video objects, wherein the special video objects are marked by special marks, and the sum of the number of the added special video objects and the number of the acquired video data objects is the preset first number;

if the number of the audio data objects acquired in the unit time is larger than a preset second number, deleting part of the audio data objects, wherein the number of the remaining audio data objects is the preset second number, and the preset second number is the number of the audio data objects which can be cached in the unit time;

if the number of the audio data objects acquired in unit time is smaller than the preset second number, adding special audio objects, wherein the special audio objects are marked by special marks, and the sum of the number of the added special audio objects and the number of the acquired audio data objects is the preset second number.

13. The method according to claim 12, wherein the method further comprises: if the number of the residual positions in the first buffer space is greater than or equal to the preset first number, writing the acquired video data objects after the last video data object in the first buffer space;

if the number of the residual positions in the first buffer space is smaller than the preset first number, writing the obtained partial video data objects from the last position to the last position after the last video data object in the first buffer space, and starting writing the residual video data objects from the first position of the first buffer space;

and determining the position of the video data object in the first buffer space as the identification of the video data object.

14. The method according to claim 12, wherein the method further comprises: after the audio data objects with the preset second number are acquired, if the number of the residual positions in the second buffer space is larger than or equal to the preset second number, the acquired audio data objects are written after the last audio data object in the second buffer space;

if the number of the residual positions in the second buffer space is smaller than the preset second number, writing the obtained partial audio data objects from the last position to the last position after the last audio data object in the second buffer space, and starting writing the residual audio data objects from the first position in the second buffer space;

And determining the position of the audio data object in the second buffer space as the identification of the audio data object.

15. An electronic device, the electronic device comprising:

one or more processors;

one or more memories;

the memory stores one or more programs that, when executed by the processor, cause the electronic device to perform the video acquisition method of any of claims 1-14.

16. A computer readable storage medium, wherein a computer program is stored in the computer readable storage medium, which when executed by a processor causes the processor to perform the video acquisition method of any one of claims 1 to 14.