CN116801034A

CN116801034A - Method and device for storing audio and video data by client

Info

Publication number: CN116801034A
Application number: CN202311075977.5A
Authority: CN
Inventors: 肖彬文
Original assignee: Haima Cloud Tianjin Information Technology Co Ltd
Current assignee: Anhui Haima Cloud Technology Co ltd
Priority date: 2023-08-25
Filing date: 2023-08-25
Publication date: 2023-09-22
Anticipated expiration: 2043-08-25
Also published as: CN116801034B

Abstract

The application provides a method and a device for storing audio and video data by a client, electronic equipment and a storage medium, wherein the method comprises the following steps: acquiring an audio and video stream of a cloud application sent by a cloud server; for each frame of cloud application picture corresponding to the audio and video stream, judging whether at least two picture groups exist before the picture group where the frame of cloud application picture is located; if at least two picture groups exist, judging whether the time interval between the acquisition time of the frame cloud application picture and the acquisition time of the I frame of the second picture group before the picture group where the frame cloud application picture is positioned is larger than or equal to the preset first time length; if the frame cloud application picture is greater than or equal to the preset first time length, deleting a first picture group before a picture group where the frame cloud application picture is positioned; based on a recording instruction of a user, target audio and video data are generated according to cloud application pictures and corresponding audios stored by a client, and the scheme can record and play back the wonderful audio and video generated in the cloud application process of the user.

Description

Method and device for storing audio and video data by client

Technical Field

The present application relates to the field of computer technologies, and in particular, to a method and apparatus for storing audio and video data by a client, an electronic device, and a storage medium.

Background

With the gradual clouding of more and more applications into cloud application modes (such as cloud game modes), the demand of users for recording and playing back the wonderful audio and video generated in the process of using the cloud application is increased, and two solutions exist for the demand at present: firstly, the application itself is internally provided with a function of supporting recording and playback, and the mode is generally only provided for large-scale applications, if the application itself does not support the recording and playback function, the cloud application does not support the recording and playback function because the original functional characteristics of the application are not changed after the application clouds; if the native application supports the recording and playback function, the recording and playback function of the application after clouding can store playback audio and video to a cloud server where the cloud application is located, and if the client wants to acquire playback audio and video, the client needs to remotely transmit the audio and video from the cloud server to the local, so that the complexity of the overall design scheme can be increased; secondly, the user realizes the recording and playback function by means of the third-party tool, and the operation cost of the user is increased and the third-party tool is relied on.

In view of this, how to provide a solution that is simple to implement, and can meet the requirement of recording and playing back the wonderful audio and video by the user without using the recording and playback function supported by itself and without relying on a third party tool is a technical problem to be solved.

Disclosure of Invention

In summary, the embodiment of the application provides a method and a device for storing audio and video data by a client, electronic equipment and a storage medium, which can provide a scheme which is simple to realize, does not need to apply self-supporting recording and playback functions and does not depend on a third party tool, and can meet the requirement of recording and playback of wonderful audio and video by a user.

In a first aspect, an embodiment of the present application provides a method for storing audio and video data by a client, which is applied to the client, and includes:

acquiring an audio and video stream of a cloud application sent by a cloud server;

judging whether at least two picture groups exist before a picture group where the frame cloud application picture is located for each frame cloud application picture corresponding to the audio and video stream, wherein each picture group comprises an I frame and at least one P frame, and the I frame is a first frame cloud application picture of the picture group where the I frame is located;

if at least two image groups exist before the image group where the frame cloud application image is located, judging whether the time interval between the acquisition time of the frame cloud application image and the acquisition time of the I frame of the second image group before the image group where the frame cloud application image is located is larger than or equal to the preset first time length;

if the time interval between the acquisition time of the frame cloud application picture and the acquisition time of the I frame of the second picture group is greater than or equal to the preset first time length, deleting the first picture group before the picture group where the frame cloud application picture is located;

and generating target audio and video data according to the cloud application picture stored by the client and the corresponding audio based on the recording instruction of the user.

In a second aspect, an embodiment of the present application further provides an apparatus for storing audio and video data by a client, where the apparatus is applied to the client, and includes:

the acquisition unit is used for acquiring the audio and video streams of the cloud application sent by the cloud server;

the first judging unit is used for judging whether at least two picture groups exist before a picture group where the frame cloud application picture is located for each frame cloud application picture corresponding to the audio and video stream, wherein each picture group comprises an I frame and at least one P frame, and the I frame is a first frame cloud application picture of the picture group where the I frame is located;

the second judging unit is used for judging whether the time interval between the acquisition time of the frame cloud application picture and the acquisition time of the I frame of the second picture group before the picture group where the frame cloud application picture is located is greater than or equal to the preset first time length if at least two picture groups exist before the picture group where the frame cloud application picture is located;

the deleting unit is used for deleting the first picture group before the picture group where the frame cloud application picture is located if the time interval between the acquisition time of the frame cloud application picture and the acquisition time of the I frame of the second picture group is greater than or equal to the preset first time length;

and the generating unit is used for generating target audio and video data according to the cloud application picture stored by the client and the corresponding audio based on the recording instruction of the user.

In a third aspect, embodiments of the present application also provide a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method for storing audio-video data for a client according to the first aspect.

In a fourth aspect, an embodiment of the present application further provides an electronic device, including: a processor, a storage medium and a bus, the storage medium storing machine-readable instructions executable by the processor, the processor and the storage medium communicating over the bus when the electronic device is running, the processor executing the machine-readable instructions to perform the steps of the method of storing audiovisual data by a client according to the first aspect.

In summary, the device, the electronic device and the storage medium for storing audio and video data by the client provided by the embodiments of the present application, for each frame of cloud application picture corresponding to an audio and video stream of a cloud application sent by a cloud server, the time interval between the acquisition time of the frame of cloud application picture and the acquisition time of an I frame of a second picture group before the picture group where the frame of cloud application picture is located is compared with a preset first time length, and when the time interval is greater than or equal to the preset first time length, the first picture group before the picture group where the frame of cloud application picture is located is deleted, so that audio and video stored by the client can be maintained around the preset first time length all the time, thereby realizing recording and playback of a wonderful audio and video generated in the process of using the cloud application by a user.

Drawings

Fig. 1 is a flowchart of a method for storing audio and video data by a client according to an embodiment of the present application;

fig. 2 is a schematic diagram of a buffer area involved in another method for storing audio and video data by a client according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of a device for storing audio and video data by a client according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described with reference to the accompanying drawings in the embodiments of the present application, and it should be understood that the drawings in the present application are for the purpose of illustration and description only and are not intended to limit the scope of the present application. In addition, it should be understood that the schematic drawings are not drawn to scale. A flowchart, as used in this disclosure, illustrates operations implemented according to some embodiments of the present application. It should be understood that the operations of the flow diagrams may be implemented out of order and that steps without logical context may be performed in reverse order or concurrently. Moreover, one or more other operations may be added to or removed from the flow diagrams by those skilled in the art under the direction of the present disclosure.

In addition, the described embodiments are only some, but not all, embodiments of the application. The components of the embodiments of the present application generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the application, as presented in the figures, is not intended to limit the scope of the application, as claimed, but is merely representative of selected embodiments of the application. All other embodiments, which can be made by a person skilled in the art without making any inventive effort, are intended to be within the scope of the present application.

It should be noted that the term "comprising" will be used in embodiments of the application to indicate the presence of the features stated hereafter, but not to exclude the addition of other features.

Referring to fig. 1, a method for storing audio and video data by a client according to an embodiment of the present application is applied to a client, and includes:

s10, acquiring an audio and video stream of a cloud application sent by a cloud server;

s11, judging whether at least two picture groups exist before a picture group where the frame cloud application picture is located for each frame cloud application picture corresponding to the audio and video stream, wherein each picture group comprises an I frame and at least one P frame, and the I frame is a first frame cloud application picture of the picture group where the I frame is located;

in this embodiment, it should be noted that, each frame of cloud application picture is an I frame or a P frame, the first frame of cloud application picture is an I frame, each I frame and the P frame between the I frame and the first subsequent I frame form a group of pictures (i.e., GOP, a GOP is a group of continuous pictures, the first frame image of GOP is an I frame), where the I frame is also called intra picture (intra picture) or a key frame, and is a full frame compressed encoded frame, and when decoding, it is not necessary to refer to other pictures, and only the data of the I frame can reconstruct a complete image; the P Frame is also called a Predictive Frame (Predictive Frame), which is used for performing compression coding on the difference information of a Frame image corresponding to the P Frame and a previous Frame image, so that the coding of the P Frame needs to depend on the P Frame or the I Frame in front of the P Frame, meanwhile, the P Frame cannot be independently decoded, and the previous Frame image and the difference information must be summed up to reconstruct the complete P Frame image during the P Frame decoding. In general, the client directly decodes and renders the encoded and compressed audio and video data sent by the cloud server after receiving the encoded and compressed audio and video data, and in order to cache the audio and video data, a caching mechanism of the client asynchronously copies and caches the audio and video data; the video data and the audio data simultaneously have 2 kinds of buffer areas in frame units and GOP units, respectively; the client sets a fixed recording duration (i.e. a preset first time duration, for example, 20 s), and if the recording duration is exceeded, removes the audio/video data in GOP units. Taking video data as an example, 2 types of video buffers can be created at the client in a specific implementation: the first is a sequential buffer area taking a cloud application picture frame as a minimum buffer unit; the second is a sequential buffer with GOP as the smallest buffer unit. When the second I frame appears in the first buffer, all frames before the second I frame (excluding the second I frame) can be moved to the second buffer as a GOP unit, and the second I frame is continuously buffered as a start frame of the first buffer, and obviously, both buffers use the I frame as a start frame: the first buffer will only have one I-frame, while the second buffer will have multiple I-frames. Each time a frame of picture is buffered in the first buffer, it is determined whether at least two GOP's have been buffered in the second buffer.

S12, if at least two frame groups exist before the frame group where the frame cloud application picture is located, judging whether the time interval between the acquisition time of the frame cloud application picture and the acquisition time of the I frame of the second frame group before the frame group where the frame cloud application picture is located is larger than or equal to the preset first time length;

if at least two GOPs are cached in the second cache region, comparing the acquisition time of the frame picture (the time stamp added in attribute information for the cloud application picture when the cloud server side acquires the cloud application picture is used for indicating the acquisition time of the cloud application picture) with the acquisition time of the I frame in the second GOP cached in the second cache region, and if the difference between the acquisition time and the acquisition time reaches the recording time set by a user, deleting the data of the first GOP cached in the second cache region; and if the difference value does not meet the recording duration, continuing to buffer, and repeating the steps.

FIG. 2 is a schematic diagram of two buffers. In fig. 2, the output frame rate of the cloud application picture is 30fps, GOP1, GOP2 and GOP3 are currently buffered in the second buffer, GOP1 contains I frame I1 and P frames P1, P2, …, P239, GOP2 contains I frame I2 and P frames P240, P241, …, P478, GOP3 contains I frame I3 and P frames P479, P480, …, P717, I frame I4 and P frames P718, P719, …, P806 are currently buffered in the first buffer, the acquisition time of P806 is assumed to be 27.965s, the acquisition time of I frame I2 of the second GOP (i.e. GOP 2) is assumed to be 8s in the second buffer, the time interval between the acquisition times of P806 and I2 is 19.965s < 20s, the first buffer is then assumed to be buffered, the first buffer is assumed to be buffered with P frame P807, the acquisition time of P807 is assumed to be 28s, and the acquisition time of P807 s is assumed to be 20s in the second GOP (i.e. the time interval between the first GOP1 and P807 s). Thereafter, GOP2 in the second buffer becomes the first GOP and GOP3 becomes the second GOP, and the acquisition time of the video frames buffered in the first buffer is computationally compared with the acquisition time of the I frames I3 of GOP 3.

S13, if the time interval between the acquisition time of the frame cloud application picture and the acquisition time of the I frame of the second picture group is greater than or equal to the preset first time length, deleting the first picture group before the picture group where the frame cloud application picture is located;

it should be noted that, the audio data buffer adopts two buffer areas of the same type as the video data buffer, and the basic buffer logic is similar, and the main difference is that the audio buffer GOP uses the video buffer GOP as a reference, and when the video buffer satisfies that one video GOP is moved to the video GOP buffer, the corresponding audio buffer data is also moved to the audio GOP buffer as one audio GOP. It will be appreciated that, since the audio data does not have the GOP concept, in order to ensure synchronization of audio and video, a GOP buffer of the audio data is created based on the GOP of the video data when the audio data is buffered, that is, the GOP of the video data corresponds to the GOP of the audio data one by one, so that the audio and video data are synchronized in time and the minimum unit of storage. In general, the caching logic employs a first-in first-out caching mechanism.

And S14, generating target audio and video data according to the cloud application picture stored by the client and the corresponding audio based on the recording instruction of the user.

In this embodiment, it should be noted that, based on a recording instruction of a user, when the user actively triggers to acquire the cached audio and video, the cached audio and video data is encapsulated according to a standard ffmpeg (a general tool set for recording and converting the audio and video) interface call flow (for example, encapsulated into an MP4 format), and meanwhile, the client also continues to cache until the user exits.

According to the method for storing audio and video data by the client, for each frame of cloud application picture corresponding to the audio and video stream of the cloud application sent by the cloud server, the time interval between the acquisition time of the frame of cloud application picture and the acquisition time of the I frame of the second picture group before the picture group where the frame of cloud application picture is located is compared with the preset first time length, and when the time interval is greater than or equal to the preset first time length, the first picture group before the picture group where the frame of cloud application picture is located is deleted, so that audio and video stored by the client can be always maintained at about the preset first time length, and the wonderful audio and video generated in the cloud application process by a user can be played back.

On the basis of the foregoing method embodiment, the method may further include:

and sending an I frame interval setting request to the cloud service end, so that the cloud service end sets the I frame coding frame number interval of the encoder according to the I frame interval setting request.

In this embodiment, it should be noted that, when the frame rate of the cloud application picture rendering is greatly reduced (for example, the cloud application picture is changed from a severe motion picture to a non-severe motion picture), the frame rate of the cloud application picture acquired by the cloud server is also greatly reduced, and because the cloud server encoder encodes the I frames according to the specific frame number, the I frame encoding time interval of the encoder is greatly increased, so that the time interval between the I frames received by the client is greatly increased, and then the time length corresponding to the buffered audio and video data is far longer than the preset first time length. In order to solve the problem, the client can send an I-frame interval setting request to the cloud server to set the I-frame coding frame interval of the encoder (i.e. how many frames of pictures encode one frame of I-frame) according to the I-frame interval setting request, so as to ensure that the time interval between the I-frames output by the encoder can meet the requirement of the client for buffering and recording audio and video data about a preset first time length.

judging whether a new I frame is received when a preset second time length arrives, if the new I frame is not received when the preset second time length arrives, sending an I frame application request to the cloud server, so that the cloud server encodes a latest frame of cloud application picture to be encoded into an I frame through an encoder based on the I frame application request, and sending the I frame to the client.

In this embodiment, it should be noted that, in order to solve the problem that the time length corresponding to the audio and video data buffered by the client is far longer than the preset first time length, besides the solution in the foregoing embodiment that the client sends the I-frame interval setting request to the cloud server to control the I-frame generation interval, the client may also actively and periodically apply for the I-frame to the cloud server, so as to ensure that the time interval between the I-frames received by the client is not too large, and thus ensure that the time length corresponding to the audio and video data buffered by the client is about the preset first time length.

On the basis of the foregoing method embodiment, the generating, based on the recording instruction of the user, the target audio/video data according to the cloud application image and the corresponding audio stored by the client may include:

based on a recording instruction of a user, respectively transcoding the cloud application picture and the corresponding audio stored by the client to obtain transcoded video and transcoded audio;

and respectively intercepting the data of the post preset first time length in the transcoded video and the transcoded audio, and packaging the intercepted data into audio-video data in a preset format.

In this embodiment, it should be noted that, in the foregoing embodiment, the length of the audio and video recorded may be greater than the preset first time length, in order to obtain the audio and video data with the preset first time length, the audio and video cached by the client may be transcoded respectively, then the data with the preset first time length after the transcoded audio and video are intercepted respectively, and the intercepted data is packaged into audio and video data with a predetermined format (for example, an MP4 package interface of ffmpeg is used to package the audio and video data into an MP4 format video file).

Referring to fig. 3, an apparatus for storing audio and video data in a client according to an embodiment of the present application is applied to a client, and includes:

the acquiring unit 30 is configured to acquire an audio/video stream of a cloud application sent by the cloud server;

the first judging unit 31 is configured to judge, for each frame of cloud application picture corresponding to the audio and video stream, whether at least two picture groups exist before a picture group where the frame of cloud application picture is located, where each picture group includes an I frame and at least one P frame, and the I frame is a first frame of cloud application picture of the picture group where the I frame is located;

a second judging unit 32, configured to judge whether a time interval between the acquisition time of the frame cloud application picture and the acquisition time of the I-frame of the second picture group before the picture group where the frame cloud application picture is located is greater than or equal to a preset first time length if there are at least two picture groups before the picture group where the frame cloud application picture is located;

a deleting unit 33, configured to delete a first frame group before the frame group where the frame cloud application image is located if a time interval between the acquisition time of the frame cloud application image and the acquisition time of the I-frame of the second frame group is greater than or equal to the preset first time length;

and the generating unit 34 is configured to generate target audio/video data according to the cloud application picture stored by the client and the corresponding audio based on the recording instruction of the user.

According to the device for storing audio and video data by the client, for each frame of cloud application picture corresponding to the audio and video stream of the cloud application sent by the cloud server, the time interval between the acquisition time of the frame of cloud application picture and the acquisition time of the I frame of the second picture group before the picture group where the frame of cloud application picture is located is compared with the preset first time length, and when the time interval is greater than or equal to the preset first time length, the first picture group before the picture group where the frame of cloud application picture is located is deleted, so that audio and video stored by the client can be always maintained at about the preset first time length, and the wonderful audio and video generated in the cloud application process by a user can be played back.

On the basis of the foregoing apparatus embodiment, the apparatus may further include:

the first sending unit is used for sending an I frame interval setting request to the cloud server, so that the cloud server sets the I frame coding frame number interval of the encoder according to the I frame interval setting request.

and the second sending unit is used for judging whether a new I frame is received when the preset second time length arrives, if the new I frame is not received when the preset second time length arrives, sending an I frame application request to the cloud server, so that the cloud server encodes the latest frame of cloud application picture to be encoded into an I frame through an encoder based on the I frame application request, and sends the I frame to the client.

On the basis of the foregoing apparatus embodiment, the generating unit may be configured to:

The implementation process of the device for storing audio and video data by the client provided by the embodiment of the application is consistent with the method for storing audio and video data by the client provided by the embodiment of the application, and the achieved effect is the same as the method for storing audio and video data by the client provided by the embodiment of the application, and the detailed description is omitted.

As shown in fig. 4, an electronic device provided in an embodiment of the present application includes: a processor 40, a memory 41 and a bus 42, said memory 41 storing machine readable instructions executable by said processor 40, said processor 40 and said memory 41 communicating via the bus 42 when the electronic device is running, said processor 40 executing said machine readable instructions to perform the steps of a method for storing audio and video data as described above for a client.

Specifically, the memory 41 and the processor 40 can be general-purpose memories and processors, and are not limited herein, and the method for storing audio and video data by the client can be performed when the processor 40 runs a computer program stored in the memory 41.

Corresponding to the method for storing the audio and video data by the client, the embodiment of the application also provides a computer readable storage medium, wherein the computer readable storage medium is stored with a computer program, and the computer program executes the steps of the method for storing the audio and video data by the client when being run by a processor.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described system and apparatus may refer to corresponding procedures in the method embodiments, and are not repeated in the present disclosure. In the several embodiments provided by the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. The above-described apparatus embodiments are merely illustrative, and the division of the modules is merely a logical function division, and there may be additional divisions when actually implemented, and for example, multiple modules or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some communication interface, indirect coupling or communication connection of devices or modules, electrical, mechanical, or other form.

The modules described as separate components may or may not be physically separate, and components shown as modules may or may not be physical units, may be located in one place, or may be distributed over multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer readable storage medium executable by a processor. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk, etc.

The foregoing is merely illustrative of the present application, and the present application is not limited thereto, and any person skilled in the art will readily appreciate variations or alternatives within the scope of the present application. Therefore, the protection scope of the application is subject to the protection scope of the claims.

Claims

1. A method for storing audio and video data by a client, applied to the client, is characterized by comprising the following steps:

2. The method as recited in claim 1, further comprising:

3. The method as recited in claim 1, further comprising:

4. The method of any one of claims 1 to 3, wherein the generating, based on the recording instruction of the user, the target audio-video data according to the cloud application picture and the corresponding audio stored by the client includes:

5. A device for storing audio and video data by a client, applied to the client, comprising:

6. The apparatus as recited in claim 5, further comprising:

7. The apparatus as recited in claim 5, further comprising:

8. The apparatus according to any one of claims 5 to 7, wherein the generating unit is configured to:

9. A computer-readable storage medium, on which a computer program is stored, which computer program, when being executed by a processor, performs the steps of the method of storing audio-video data for a client according to any one of claims 1 to 4.

10. An electronic device, comprising: a processor, a storage medium and a bus, the storage medium storing machine-readable instructions executable by the processor, the processor and the storage medium communicating over the bus when the electronic device is running, the processor executing the machine-readable instructions to perform the steps of the method of storing audio-video data for a client as claimed in any one of claims 1 to 4.