CN116052701A

CN116052701A - Audio processing method and electronic equipment

Info

Publication number: CN116052701A
Application number: CN202210793866.7A
Authority: CN
Inventors: 肖瑶; 林晨
Original assignee: Honor Device Co Ltd
Current assignee: Honor Device Co Ltd
Priority date: 2022-07-07
Filing date: 2022-07-07
Publication date: 2023-05-02
Anticipated expiration: 2042-07-07
Also published as: CN116052701B

Abstract

The application provides an audio processing method and electronic equipment, wherein the method is applied to the electronic equipment, the electronic equipment supports playing of a first media stream, and the method comprises the following steps: receiving a recording instruction in the process of playing the first media stream, wherein the recording instruction is used for indicating the electronic equipment to record audio data corresponding to a first moment, and the first moment is the playing moment of the first media stream when the recording instruction is received; and encoding the first audio data based on the recording instruction to obtain second audio data in an advanced audio coding aac format, wherein the first audio data is the audio data in an uncoded pulse modulation recording PCM format which is cached in the electronic equipment and corresponds to the first time. The CPU occupancy rate of the electronic equipment supporting playing and recording the media stream in the process of playing the media stream is improved.

Description

Audio processing method and electronic equipment

Technical Field

The present disclosure relates to the field of audio encoding, and in particular, to an audio processing method and an electronic device.

Background

With the improvement of the living standard of people, the media stream cultural entertainment industry rises gradually. For example, leisure time is a big standard for people to live well through the mobile terminal to watch the movie.

When a user is following a movie by the mobile terminal, the user may want to save the highlight clip in the local storage space of the mobile terminal for review or sharing to others again. Generally, a programmer provides a user with an audio/video recording function, which is to record and encode audio/video data while playing video images, and buffer the encoded data. And after receiving a user initiated recording instruction, storing the coded data corresponding to the time period selected by the user in the cached coded data to the local equipment.

However, with the audio and video recording function, the problem of video clip can often occur when the mobile terminal plays the media stream, thus bringing bad use experience to the user and improving the CPU occupation rate.

Disclosure of Invention

In a first aspect, the present application provides an audio processing method, applied to an electronic device, where the electronic device supports playing a first media stream, the method includes: receiving a recording instruction in the process of playing the first media stream, wherein the recording instruction is used for indicating the electronic equipment to record audio data corresponding to a first moment, and the first moment is the playing moment of the first media stream when the recording instruction is received; and encoding the first audio data based on the recording instruction to obtain second audio data in an advanced audio coding aac format, wherein the first audio data is the audio data in an uncoded pulse modulation recording PCM format which is cached in the electronic equipment and corresponds to the first time.

Therefore, the cached PCM format audio data is encoded after the recording instruction is received, instead of recording the PCM format original audio data and encoding the PCM format original audio data at the same time when the whole process of playing the media stream is played, so that the CPU occupation rate of the electronic equipment supporting the recording function of the media stream in the process of playing the media stream is reduced as much as possible, the video clamping problem and the problem of low overall response speed of the electronic equipment supporting the recording function of the media stream in the process of playing the media stream are improved, and the entertainment experience of a user is improved.

In one possible implementation manner, the encoding the first audio data based on the recording instruction, to obtain the second audio data in the advanced audio coding aac format includes: determining the first audio data from third audio data, wherein the third audio data is the audio data cached in the electronic equipment; and encoding the first audio data based on the recording instruction to obtain the second audio data.

The third audio data is illustratively only audio data that has been buffered in the electronic device, the start time and the end time of which may be independent of the first time. The third audio data is, for example, audio data corresponding to a target data content in the first media stream data, where the target data content may also be understood as media stream data corresponding to a highlight moment. And then, after the electronic equipment receives the recording instruction, selecting the corresponding audio data in a period of time before the first time and a period of time after the first time from the cached third audio data as the first audio data.

The third audio data may also be audio data corresponding to the first time. For example, the third audio data is all audio data of the first media stream before the first time, or the third audio data is audio data of the first media stream in a first time period before the first time, or the third audio data includes audio data of the first media stream in the first time period before the first time period and audio data of the first media stream in a second time period after the first time period, which is not limited herein.

In one possible implementation manner, the duration of the first audio data is less than or equal to a first encoding duration, and the duration of the third audio data is greater than the first encoding duration and less than or equal to a second encoding duration.

Thus, in one aspect, the length of time of the buffered third audio data is less than or equal to the second encoding length of time, for example, the second encoding length of time is 20s or 30s, so as to reduce the buffering pressure of the electronic device for buffering the third audio data.

In another aspect, the duration of the third audio data is longer than the first encoding duration, so that in an audio-video synthesis scene in which the first media stream includes audio data and video data, because the first video data is written into the media multiplexer too slowly, when the first video data is written into the media multiplexer, the electronic device updates the cached third audio data in real time, so that the time period of the first audio data required to be encoded in the third audio data and the second audio data obtained after the data content is encoded, and when the first audio data is finally written into the media multiplexer, the time period of the second audio data and the data content cannot be completely matched with the time period of the first video data and the data content are improved, the probability of program execution errors is reduced, the response duration of the electronic device for recording instructions is reduced, and the user experience is further improved.

In one possible implementation manner, before the receiving the recording instruction, the method further includes: and caching the third audio data through a first array, wherein each array element in the first array is used for recording the audio data cached in a cache area and the corresponding time stamp of the audio data cached in the cache area.

For example, if the ending time of the third audio data is the first time, the electronic device may record and buffer the third audio data before receiving the recording instruction.

In one possible implementation manner, before the receiving the recording instruction, the method further includes: caching audio data corresponding to a third time period in the third audio data, wherein the ending time of the third time period is the first time; after the receiving the recording instruction, before the encoding the first audio data based on the recording instruction, the method further includes: and caching the audio data corresponding to a fourth time period in the third audio data, wherein the starting moment of the fourth time period is the first moment.

Illustratively, the electronic device supports recording the aforementioned third audio data in PCM format in a forward recording and in a backward recording. That is, when the ending time of the third audio data is later than the first time and the starting time of the third audio data is earlier than the first time, the electronic device may record and cache the audio data corresponding to the third time period forward before receiving the recording instruction, and record and cache the audio data corresponding to the fourth time period backward after receiving the recording instruction, so as to obtain the complete third audio data.

It can be understood that if the electronic device only executes the buffering task of the audio data corresponding to the third time period in the third audio data before receiving the recording instruction, the data amount of the buffered PCM format audio data can be reduced, and the buffering pressure of the buffer area in the electronic device can be reduced.

In one possible implementation manner, the end time of the first audio data is the first time, and the duration of the first audio data is a first encoding duration.

Illustratively, determining the first audio data from the third audio data includes: and taking the first time as the ending time of the first audio data, and intercepting the first audio data from the third audio data based on a first coding time length, wherein the first coding time length is the time length of the first audio data.

In one possible implementation manner, the starting time of the first audio data is earlier than the first time, the ending time of the first audio data is later than the first time, and the duration of the first audio data is a first encoding duration.

Illustratively, determining the first audio data from the third audio data includes: taking the second moment as the starting moment of the first audio data, and intercepting the first audio data from the third audio data based on the first encoding time length, wherein the second moment is earlier than the first moment, and the ending moment of the first audio data is later than the first moment; or, taking the third time as the end time of the first audio data, and intercepting the first audio data from the third audio data based on the first encoding time length, wherein the third time is later than the first time, and the start time of the first audio data is earlier than the first time.

Illustratively, the determining the first audio data from the third audio data includes: determining a first starting time and a first ending time according to user input, wherein the difference value between the first starting time and the first ending time is smaller than or equal to a first coding duration; and determining audio data corresponding to the first starting time and the first ending time in the third audio data as the first audio data.

In one possible implementation, the recording instruction is a user initiated recording instruction.

In one possible implementation manner, the recording instruction is triggered by the electronic device according to a target condition, where the target condition is that the current playing content of the electronic device is target media stream data in the first media stream.

In a possible implementation, the first media stream includes audio data and video data, and after the obtaining the second audio data in the advanced audio coding aac format, the method further includes: writing the second audio data into an audio-video synthesizer according to a preset storage path; performing audio and video synthesis based on the audio and video synthesizer and the second audio data to obtain second media stream data corresponding to the first time, wherein the second media stream data is mpeg-4 (mp 4) format media stream data

In a second aspect, an embodiment of the present application provides an electronic device, including: one or more processors and memory; the memory is coupled with the one or more processors, the memory for storing computer program code comprising computer instructions that the one or more processors invoke the computer instructions to cause the electronic device to perform the method of the first aspect or any possible implementation of the first aspect.

In a third aspect, embodiments of the present application provide a chip system, which is applied to an electronic device, and the chip system includes one or more processors configured to invoke computer instructions to cause the electronic device to perform the method shown in the first aspect or any possible implementation manner of the first aspect.

In a fourth aspect, embodiments of the present application provide a computer program product comprising instructions which, when run on an electronic device, cause the electronic device to perform the method of the first aspect or any of the possible implementations of the first aspect.

In a fifth aspect, embodiments of the present application provide a computer readable storage medium comprising instructions, characterized in that the instructions, when run on an electronic device, cause the electronic device to perform the method of the first aspect or any possible implementation of the first aspect.

It will be appreciated that the electronic device provided in the second aspect, the chip provided in the third aspect, the computer program product provided in the fourth aspect and the computer storage medium provided in the fifth aspect are all configured to perform the method provided by the embodiments of the present application. Therefore, the advantages achieved by the method can be referred to as the advantages of the corresponding method, and will not be described herein.

Drawings

Fig. 1 is a schematic flow chart of an audio processing method according to an embodiment of the present application;

fig. 2 is a schematic diagram of a picture for playing a first media stream according to an embodiment of the present application;

FIG. 3 is a schematic diagram of storing PCM format audio data in a buffer queue array according to a first-in first-out principle according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a relationship between a first time period and a first time according to an embodiment of the present disclosure;

FIG. 5 is a schematic diagram of a relationship between a first time period and a second time period according to an embodiment of the present disclosure;

FIG. 6 is a schematic diagram of a relationship between a first time period and a second time period according to an embodiment of the present disclosure;

FIG. 7 is a schematic diagram of a relationship between a first time period and a first time according to an embodiment of the present disclosure;

FIG. 8 is a schematic diagram of a relationship between a first time period and a second time period according to an embodiment of the present disclosure;

fig. 9 is a flowchart of yet another audio processing method according to an embodiment of the present application;

fig. 10 is a flowchart of yet another audio processing method according to an embodiment of the present application;

fig. 11 is a schematic structural diagram of an electronic device 100 according to an embodiment of the present application;

fig. 12 is a software configuration block diagram of the electronic device 100 of the embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the present application will be further described with reference to the accompanying drawings.

The terms "first" and "second" and the like in the description, claims and drawings of the present application are used for distinguishing between different objects and not for describing a particular sequential order. Furthermore, the terms "comprising," "including," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion. Such as a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to the list of steps or elements but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the present application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly understand that the embodiments described herein may be combined with other embodiments.

In the present application, "at least one (item)" means one or more, "a plurality" means two or more, and "at least two (items)" means two or three or more, and/or "for describing an association relationship of an association object, three kinds of relationships may exist, for example," a and/or B "may represent: only a, only B and both a and B are present, wherein a, B may be singular or plural. The character "/" generally indicates that the context-dependent object is an "or" relationship. "at least one of (a) or a similar expression thereof means any combination of these items. For example, at least one (one) of a, b or c may represent: a, b, c, "a and b", "a and c", "b and c", or "a and b and c".

Generally, if the electronic device needs to implement the function of forward recording according to a recording instruction input by a user or according to some highlight moment triggering conditions preset by the electronic device, the electronic device needs to buffer media stream data in the process of playing media streams. Thus, after receiving the recording instruction triggered by the user at the first moment, the electronic device can utilize the cached media stream data according to the recording time period specified by the user to realize a higher-efficiency function of the previous recorded media stream data, and can avoid the problem that in some scenes, the required recorded data cannot be obtained in a retrospective way due to the fact that the real-time cached media stream data is not available (for example, in a game real-time fight scene, because of the real-time display of game pictures, if the game pictures are not available in a real-time cached way, the required recorded data cannot be obtained in a retrospective way).

The following is a comparison of the advantages of the audio processing method according to the embodiments of the present application, in combination with other audio processing methods:

in other audio processing methods, an electronic device records audio data in a media stream through audio recording (audio recording) while playing the media stream, obtains original audio data in a pulse modulation recording (pulse code modulation, PCM) format, and immediately encodes the recorded original audio data by using a media codec (media codec) provided by Android for encoding and decoding audio and video after obtaining the original audio data in the PCM format, so as to obtain encoded audio data, and caches the encoded audio data. And then, receiving a recording instruction triggered by a user at a first moment, writing the encoded audio data corresponding to the recording time period in the encoded audio data into an audio-video synthesizer (audio multiplexer) according to the recording time period designated by the user, synthesizing the audio and the video, and storing the synthesized media stream data to the local so as to be reviewed by the user again or shared with other people through third-party software.

According to the audio processing method, a certain CPU utilization rate is needed for playing the media stream in the multimedia container format (such as the mp4 format), the recording and encoding work of the PCM format audio data is carried out simultaneously when the media stream is played, the data format conversion of the audio data is needed for encoding the audio data by the CPU, the CPU occupancy rate in the media stream playing process is required to be improved, the problems of video clamping and slow response speed of the whole machine often occur, and the entertainment experience of a user is influenced.

In view of this, the embodiments of the present application provide an audio processing method and an electronic device, which can effectively reduce the CPU occupancy rate when the electronic device plays a media stream. By the method provided by the embodiment of the application, the electronic equipment collects the original audio data while playing the media stream, obtains the original audio data (third audio data) in the PCM format, and then directly caches the third audio data. That is, the electronic device may not immediately encode the recorded original audio data in the PCM format by the media codec encoding means after the original audio data in the PCM format is acquired; and after receiving a recording instruction (which can be understood as an encoding instruction) initiated by a user at a first moment, acquiring target PCM format original audio data (first audio data) corresponding to a recording time period in the cached third audio data according to a default or user-specified recording time period, and encoding the first audio data to obtain encoded audio data (second audio data) in an advanced audio coding (advanced audio coding, aac) format. The second audio data is saved locally for review by the user or sharing to others.

Therefore, only the original audio data in the PCM format is recorded while the media stream is played, the cached audio data in the PCM format is encoded after a recording instruction is received, instead of recording the original audio data in the PCM format and encoding the original audio data in the PCM format while the media stream is played in the whole process, so that the occupancy rate of a CPU (Central processing Unit) in the media stream playing process is reduced as much as possible, the situation of playing and blocking the media stream is reduced, the problem of slow response speed of the whole machine in the media stream playing process of the electronic equipment is solved, and the entertainment experience of a user is improved.

In a possible implementation manner, in the audio processing method provided by the embodiment of the present application, the duration of the third audio data is less than or equal to the second encoding duration, for example, the duration of the third audio data is less than or equal to 20s, which can effectively reduce the buffer pressure brought by buffering PCM format audio data to the electronic device.

Generally, in a scene of synthesizing audio data and video data, first video data in aac format is written into a media multiplexer, and then audio data in aac format is written into the media multiplexer. In the embodiment of the present application, after receiving the recording instruction, the electronic device writes the aac format first video data into the media multiplexer, encodes the first audio data in the third audio data, and writes the encoded aac format second audio data into the media multiplexer. In addition, after the electronic device receives the recording instruction, when the first video data is written into the media multiplexer, the electronic device still updates and caches the first audio data in the background. If the duration of the first audio data is equal to the duration of the third audio data, and the first video data is written into the media multiplexer too slowly, the problem that the time period of the buffered third audio data (or the first audio data) is not matched with the time period of the video data occurs.

Illustratively, the duration of the third audio data buffered by the electronic device is equal to the duration of the first audio data that is encodable, the duration of the first audio data that is encodable is set to 20s, and the duration of the buffered third audio data is also set to 20s. The first time when the electronic device receives the recording instruction is 09:40 (9 minutes 40 seconds), and the time period of the third audio data and the time period of the first video cached by the electronic device at the first time are 09:20 to 09:40. Then, the electronic device writes the first video data in aac format to the media multiplexer, and then encodes the third audio data (or the first audio data). However, when the first video data is written into the media multiplexer, the problem that the video writing speed is slow due to high CPU occupancy rate may occur, and at this time, the electronic device may continuously refresh the data content in the array of the third audio data according to the current audio data, for example, when the first video data is written into the media multiplexer for 10 seconds, the data content in the time period from 09:20 to 09:30 in the cache array is replaced by the repeated data content corresponding to the time of 09:40. Thus, there is a problem that the actually available time period in the audio data finally written into the media muxer only includes audio data in the time period from 09:30 to 09:40, and the time period from 09:20 to 09:40 of the time period of the first video data written into the media muxer and the data content are not matched.

In view of this, in the embodiment of the present application, the duration of the buffered third audio data is set to be greater than the first encoding duration, and the duration of the encoded first audio data is less than or equal to the first encoding duration, and a part of the duration is reserved for buffering the repeated data. Therefore, in the audio and video synthesis scene, when the CPU occupancy rate is high and the speed of writing the first video data into the media multiplexer is too low, the probability of the problem that the real available time period in the cached third audio data is not matched with the time period of the first video data is reduced.

For example, with the audio processing method provided in the present application, the difference between the duration of the third audio data and the first encoding duration is 10s. After receiving the recording instruction, the electronic device caches the first video data for a period of 09:20 to 09:40, but caches the first audio data for a period of 09:10 to 09:40. The electronic device delays the time period of 10s when the first video data is written into the media multiplexer, and the electronic device updates the first audio data in the cache array within the time period of 10s, so that the audio data in the time period from 09:10 to 09:20 in the first audio data in the cache array are replaced by the audio data corresponding to the time of 09:40. However, there is no overlapping period between the period (09:10 to 09:20) corresponding to the replaced data and the period (09:20 to 09:40) of the video data. Then, when the first audio data corresponding to the first video data time period (namely, the time period from 09:20 to 09:40) is encoded and the media muxer is written, the problem that the first audio data and the time period of the first video data are not matched with the data content is avoided.

Taking the played first media stream data as film data containing audio data and video data, and taking a recording instruction as a screen recording instruction as an example, in combination with the data in table 1 and table 2, analyzing differences between performing screen recording tasks by adopting other audio processing methods (recording the third audio data while playing the film and encoding the third audio data) and performing screen recording tasks by adopting the audio processing methods provided by the application under the first sample environment, the second sample environment, the third sample environment and the fourth sample environment.

For example, to support the recording function of the first media stream, a recording service process, a Java local interface (jni interface) encapsulation process, an audio and video recording process on the bottom layer, and an encoding process need to be started. For example, the CPU occupancy rates of the electronic device for executing the screen recording task in the first sample environment, the second sample environment, the third sample environment, and the fourth sample environment by using other audio processing methods are shown in the following table 1, so that the total CPU occupancy rate occupied by the electronic device for providing the screen recording function in one screen recording environment by using other audio processing methods is 57.4% on average. The CPU occupancy rates of the electronic device for executing the screen recording task under the first sample environment, the second sample environment, the third sample environment and the fourth sample environment respectively by adopting the audio processing method provided by the application are shown in the following table 2, and the total CPU occupancy rate occupied by the electronic device for providing the screen recording function under one screen recording environment by adopting the audio processing method provided by the application is 30.8% on average. In contrast, with the audio processing method provided by the embodiment of the application, the CPU occupancy rate is reduced by 26.6% in total.

TABLE 1

TABLE 2

It can be appreciated that the embodiments of the present application may be executed by any electronic device having audio data recording and encoding functions, for example, the electronic device may be an electronic device such as a mobile terminal, a tablet computer, a desktop computer, a laptop computer, a handheld computer, a notebook computer, and an ultra-mobile personal computer (UMPC), and the specific form of the electronic device is not limited herein. For brevity, in some descriptions herein, execution bodies (electronic devices) of embodiments of the present application may also be omitted.

Example 1:

the following describes in detail, with reference to fig. 1, an audio processing method provided in an embodiment of the present application, taking a first media stream played by an electronic device as an example, where the first media stream includes audio, or the first media stream includes audio and video.

The audio processing method provided in the embodiment of the application mainly includes stage 1 to stage 3. Wherein, in the stage 1, the electronic device mainly performs a task of acquiring PCM format audio data (refer to step S101 specifically); the electronic device in the stage 2 determines whether a recording instruction is received, and starts to execute the task in the stage 3 after receiving the recording instruction (refer to step S102 for details); the electronic device in phase 3 mainly performs the task of encoding PCM format audio data (see step S103 for details).

Specifically, as shown in fig. 1, the audio processing method includes the following steps:

s101, when the first media stream is played, reading and caching third audio data corresponding to the first time period.

In an embodiment of the present application, the third audio data is original audio data in PCM format. It can be understood that the audio data in PCM format adopts symbolized pulse train to record analog signals such as sound, the audio data in PCM format is digital signals composed of equal symbols, and has no encoding and compression processing, and compared with the analog signals, the audio data in PCM format is not easily affected by clutter and distortion of a transmission system, and has good sound quality.

In the embodiment of the application, the electronic device may read the third audio data through the audio chip interface. For example, the chip interface may be a pulse modulated recorded PCM audio interface, an integrated circuit built-in audio bus (IIS) interface, or an audio multimedia digital signal codec 1997 (AC97) interface. For example, the third audio data is read through the audio chip interface specifically using the audio scanning method.

Alternatively, in some possible implementations, the electronic device may collect the PCM format audio data through an audio record constructing method based on specific requirements, which is not limited herein.

It can be understood that, the PCM format audio data collected by the audio recording method needs to be limited to the specific audio channel being used in a usable state, and if the specific audio channel is in an unusable state, the electronic device cannot collect the PCM format audio data by the audio recording method. The applicability of reading PCM format audio data through the audio chip interface is stronger, and the PCM format audio data is not limited by an audio channel.

In an exemplary embodiment, in a movie playing scenario, the electronic device receives a preset play command for starting playing a movie (i.e., playing the first media stream) when using preset movie playing software (e.g., a Tencel video application, an entertainment application), and reads and caches the third audio data. Or, the electronic device pauses the playing of the film in the process of playing the film, and then the electronic device receives a playing instruction for continuing playing, plays the film (i.e. plays the first media stream), reads and caches the third audio data.

For example, in a scenario where the first media stream is a movie including audio data and video data, the electronic device displays a screen as shown in fig. 2 on the screen, and if the electronic device receives a touch operation on the play control 201, the play control 201 is used to instruct that the play state of the first media stream is changed from pause to play, then the electronic device is indicated to play the first media stream.

For another example, in the game scenario, the electronic device plays (displays) the game real-time screen (i.e., plays the first media stream) after receiving a preset start game instruction to start drawing and display the game real-time screen when using the preset game application.

It can be appreciated that the audio processing method provided in the embodiments of the present application is applicable to scenes in which movies are played online, stored movies are played offline, game scenes, audio is played online, and stored audio is played offline.

In this embodiment of the present application, the media stream refers to a section of continuous data on the time axis, such as a section of audio data, a section of video data, or a section of subtitle data, where the media stream acquired and played by the electronic device through the network may be uncompressed or compressed, and the format of the data about what type of format the media stream is played by the electronic device is irrelevant to the need of buffering and encoding PCM format audio data in the forward recording function of the electronic device. The electronic device may acquire and play the mp4 format media stream on the network, or the electronic device may download and store the mp4 format media stream to the electronic device, and play the mp4 media stream off-line. No matter how the electronic device plays the mp4 format media stream in an online or offline manner, since the multimedia container (for example, the mp4 format) is only in a file package format, that is, all the encoded video and audio are packaged into one file container for presentation, direct clipping of data on the basis of the multimedia container (for example, on the basis of the mp4 format data) is generally not supported. If the electronic equipment needs to realize the forward recording function, recording and encoding processing is needed to be carried out on the audio data and the video data in the corresponding time periods respectively, and then the encoded audio data and the encoded video data are adopted to carry out synthesis processing, so that the media stream data which needs to be intercepted is obtained.

In the embodiment of the application, the electronic device may only support forward recording and caching of PCM format audio data in a first time period before the current playing time; the PCM format audio data in a first time period before the current playing time can be cached in the forward recording mode, and the PCM format audio data in a second time period after the current playing time can be cached in the backward recording mode. The current playing time refers to the time when the current playing media stream data of the electronic device corresponds to the progress bar in the first media stream.

In one possible implementation manner, in the case that the electronic device only supports forward recording and caching of PCM format audio data, and the electronic device does not limit the duration of the recordable and cached PCM format audio data, the starting time of the first period may be a starting playing time of the first media stream, and the ending time of the first period is a current playing time of the first media stream. That is, the electronic device always buffers the PCM format audio data that has been played back from the beginning of playing the first media stream, that is, always reads and buffers the third audio data that has been played back. Illustratively, in a game competition scenario, where the user time is at a combat time, the user wishes to not relax all the way through the game, e.g., for a highlight time in a combat, the user does not wish to trigger the recording function at the time of the combat, but instead, after the combat is completed, the highlight time in the full-course recorded content is cropped and saved locally.

In other possible implementations, in the case where the electronic device has a limitation on the duration of the PCM format audio data that can be recorded and buffered, the relationship between the start time and the end time of the first period and the first duration is as follows:

case 1:

the electronic device only supports forward-recording buffering of PCM-formatted audio data for a first duration. In this case, if the played time of the first media stream is greater than or equal to the first time, the start time of the first time period may be a time corresponding to a difference obtained by subtracting the first time from the current playing time, and the end time of the first time period may be the current playing time. In this case, the first duration is the second encoding duration described elsewhere herein.

The first duration may be a preset duration, for example. For example, the first duration is 20s (20 seconds is only an example, and other suitable durations are also possible), i.e. the electronic device supports the forward recording of third audio data in the buffer 20 s. It can be understood that, if the played duration of the first media stream is less than the first duration, the starting time of the first period is the starting playing time of the first media stream.

In the embodiment of the present application, in the case where the electronic device only supports forward recording and buffering of PCM format audio data within the first duration, the electronic device may use a first cyclic array (in some expressions herein, the first cyclic array is also referred to as a first array) to buffer third audio data corresponding to the first duration. For example, if the electronic device reads the PCM format audio data once through the chip interface, PCM format audio data corresponding to a duration of 64ms may be obtained, and generally, the size of the PCM format data read once is equal to the size of one buffer in the electronic device, that is, the size of one buffer corresponds to PCM format audio data with a duration of 64 ms. If the first duration is 20.096s (seconds), the electronic device may employ 314 buffers (20.096×0.064=314) for buffering the third audio data. For example, the electronic device may buffer the third audio data using a first circular array of length 314. For example, during the process of playing the media stream, the electronic device reads PCM format audio data once, obtains PCM format audio data corresponding to a duration of 64ms, and caches the original 64ms audio data into the first cyclic array. Each element in the first cyclic array is an instantiation object of an audio data class, and each instantiation object contains an audio data buffer and a timestamp corresponding to the audio data in the buffer. For example, the cyclic array is B [ comprising B [0], B [1], B [2],. Wherein B2 includes a buffer data, and a start time stamp and an end time stamp of the buffer data.

Illustratively, buffering PCM format audio data into a first cyclic array means that the first cyclic array buffers data according to a rule from the first array element to the last array element, and the first array element is replaced and updated every time the first array element is repeatedly used for buffering data. For example, in the case where PCM format audio data is cached in the first cyclic array as shown in fig. 3, where the audio data instantiation object1 (object 1) is the first array element in the first cyclic array, the object314 is the last array element in the first cyclic array, when each audio data instantiation object (object) in the first cyclic array (from object1 to object 314) caches audio data (including buffer data and a timestamp corresponding to buffer data), if the electronic device still continues playing the media stream, the electronic device updates the audio data corresponding to object1 to the audio data x when the audio data x is read next, updates the audio data corresponding to object2 to the audio data y when the audio data y is read next, and so on, until the audio data is updated to object314, and then the cyclic use of the array elements starts from object1 again.

It will be appreciated that, regarding that the electronic device may obtain PCM format audio data corresponding to a duration of 64ms each time the electronic device reads audio data, the data amount of each time the electronic device reads audio data may also be determined according to specific requirements, for example, the PCM format audio data obtained by each time the electronic device reads audio data may also be PCM format audio data corresponding to a duration of 32ms, which is not limited herein.

The difference between the time a and the time b described herein corresponds to the time obtained by subtracting the time b from the time a. For example, the difference between the current playing time of the first media stream and the first time length is the time corresponding to the value obtained by subtracting the first time length from the current playing time.

Case 2:

the electronic device supports forward recording of PCM format audio data in a first duration and backward recording of PCM format audio data in a second duration.

In this case, if the played time period of the first media stream is greater than or equal to the first time period, the starting time of the first time period may be a time corresponding to a difference between the current playing time of the first media stream and the first time period; if the played time length of the first media stream is less than the first time length, the starting time of the first time period may be the starting playing time of the first media stream. If the unplayed time length of the first media stream is greater than or equal to the second time length, the ending time of the first time period can be the time corresponding to the sum of the current playing time and the second time length; if the unplayed duration of the first media stream is less than the second duration, the starting time of the first time period may be the ending playing time of the first media stream. In this case, the sum of the first time period and the second time period is the second encoding time period described elsewhere herein.

Illustratively, the first duration is 10s and the second duration is 10s, i.e., the electronic device supports recording of original audio data in the forward-recording buffer 10s (first duration) and recording of original audio data in the backward-recording buffer 10s (second duration).

Illustratively, the electronic device may buffer raw audio data within 10s before the current time through the second circular array, and buffer raw audio data within 10s after the current time previously recorded by the electronic device through the third circular array. It can be understood that the above electronic device supports forward recording and caching for 10s, and supports backward recording and caching for 10s, which is only an example, but may also be other suitable time periods, which are not limited herein. For example, the electronic device supports forward recording for 20s (i.e., the first duration) and backward recording for 10s (i.e., the second duration). The description of the second and third cyclic arrays may be referred to above in relation to the first cyclic array and will not be described in detail herein.

It can be understood that with the development of the hardware caching capability of the electronic device and the caching technology such as data deduplication, the caching capability of the electronic device is more and more powerful, and the caching task of caching the third audio data corresponding to the first period of time does not have excessive caching pressure for the electronic device. However, in one possible implementation manner, considering that there is always some electronic devices with a small buffer space, if the length of the buffered PCM format of the third audio data is too long, which may cause a large buffer pressure of the electronic devices, the first length of time and the second length of time may be set to smaller values, for example, the first length of time and the second length of time are smaller than or equal to 20s (or may be other suitable values).

S102, acquiring a recording instruction at a first moment.

It will be appreciated that the above description of recording instructions is merely exemplary, and may be understood as a preset encoding instruction, a storage instruction, or a composite instruction, which is not limited herein.

For example, the recording instruction may be a recording instruction manually triggered by a user during playing of the media stream. For example, when a user sees and/or hears highlights during playback of a media stream and wants to intercept the save to the local, the recording function (including forward-encoded recording and/or backward-encoded recording) may be triggered by the recording instruction. For example, the recording instruction may be a recording instruction triggered by clicking a preset recording function control in the screen of the electronic device or through a preset shortcut gesture or a shortcut key.

The recording instruction may be a coding instruction, a storage instruction or a synthesis instruction. Alternatively, in other possible implementations, the electronic device may also generate an encoding instruction or a storing instruction according to the recording instruction to trigger the execution of step S103, which is not limited herein.

For example, the recording instruction may also be a default recording instruction, for example, in an online game scene, after the electronic device plays a media stream of an online game in real time, and records the third audio data in PCM format including an entire game process, a default encoding and storing instruction is generated, so as to trigger the encoding work of the third audio data, obtain encoded data, and keep storing the encoded data for a period of time.

The recording instruction may be an instruction for presetting data encoding and buffering generated when the highlight moment is detected. For example, in a movie playing scene, by default, when the electronic device plays the target content, the encoding instruction is generated; the electronic device may also provide the user with a function of custom selecting the target content. For example, in a game actual combat scene, if the electronic device detects a wonderful combat moment (e.g., a wonderful combat moment such as a four-shot or five-shot combat moment), by default, an encoding instruction is generated by default; the electronic device may also provide the user with a function of custom selecting the target content.

For example, multiplexing fig. 2, if the electronic device receives a touch operation on the recording control 202, it indicates that a recording instruction initiated by the user is received.

It can be understood that, in the audio and video recording and synthesizing scene, for the electronic device, the recording instruction initiated by the user is an encoding instruction for encoding the cached audio data and video data.

S103, acquiring first audio data corresponding to a second time period in the third audio data based on the first time, encoding the first audio data to obtain second audio data, and storing the second audio data.

The electronic device may trigger recording and buffering the third audio data when the electronic device starts playing the first media stream. When a recording instruction (which may also be understood as an encoding instruction) initiated by a user is detected, the first audio data in the third audio data is encoded. It should be noted that, before receiving the recording instruction, the electronic device only caches the third audio data, but does not encode any data in the third audio data.

In one possible implementation, the second time period may be the same as the first time period without the electronic device restricting the duration of supporting encoded PCM format audio data. That is, the electronic device may encode the buffered audio data in the full PCM format. In some descriptions, this second time period may also be referred to as a user-specified recording time period.

In one possible implementation, the electronic device may define the duration of PCM format audio data that supports encoding to be less than or equal to a maximum encoding duration (which, in some other expressions herein, is also referred to as a first encoding duration). And when the electronic device only supports forward recording and caching of PCM format audio data within a first duration, the first encoding duration is less than the first duration. When the electronic equipment supports forward recording of PCM format audio data in a first time period and backward recording and caching of PCM format audio data in a second time period, the first coding time period is smaller than the sum of the first time period and the second time period. The first encoding duration is the maximum duration of data that the electronic device is permitted to encode.

It can be understood that, if the duration of the PCM format audio data in the first period buffered in the electronic device is greater than or equal to the first encoding duration, the duration corresponding to the second period is based on the first encoding duration, that is, the duration corresponding to the second period is less than or equal to the first encoding duration. If the duration of the PCM format audio data in the first period of time buffered in the electronic device is less than the first encoding duration, the duration corresponding to the second period of time is based on the duration of the PCM format audio data in the first period of time, that is, the duration corresponding to the second period of time is less than or equal to the duration of the PCM format audio data in the first period of time.

That is, in the case where it is ensured that the time length of PCM format audio data of the first period buffered in the electronic device is greater than or equal to the first encoding time length, the time length corresponding to the second period is less than or equal to the first encoding time length. For example, the electronic device only supports forward recording and buffering of PCM format audio data in a first duration, and the first duration is 20s, which indicates that the electronic device can buffer PCM format audio data in a duration of 20s at maximum; for example, the first encoding duration is 16s (16 s <20 s), which means that the maximum duration of PCM format audio data that the electronic device supports encoding is 16s; if the time length of the PCM format audio data buffered in the electronic device is greater than or equal to 16s, the time length corresponding to the second time period may be less than or equal to 16s.

From the relationship between the start time and the end time of the first time period and the current time (the current time is the first time in step S103), the relationship between the start time and the end time of the second time period and the first time may be obtained, where the following cases may be mentioned:

case 3:

under the condition that the electronic equipment only supports forward-recording and caching of the PCM format audio data in the first time period, or under the condition that the electronic equipment only supports forward-recording and caching of the PCM format audio data in the first time period and the electronic equipment does not limit the time period of the recordable and cached PCM format audio data, third audio data corresponding to the first time period is stored in the electronic equipment. As shown in fig. 4, when the electronic device receives the recording instruction at a first time (time t 1), third audio data corresponding to a first time period is cached in the electronic device, and the ending time of the first time period is time t 1.

According to whether the electronic device supports user-defined selection of the starting time and/or the ending time of the second time period, the case 3 can be further divided into the following three cases:

case 3-1: the electronic device does not support the user to customize the duration corresponding to the second time period, and the electronic device does not support the user to customize the starting time and the ending time of the second time period, or the electronic device supports the user to customize the starting time and the ending time of the second time period but the user does not customize the second time period, where the second time period may be a default time period. For example, as shown in fig. 5, by default, the duration corresponding to the second time period is equal to the first encoding duration, the start time of the second time period is the time corresponding to the difference between the time t1 and the first encoding duration, and the end time of the second time period is the time t.

Case 3-2: the electronic device does not support the user to customize the duration corresponding to the second time period, but supports the user to customize one of the start time or the end time of the second time period. As shown in fig. 6, the duration corresponding to the second time period is still equal to the first encoding duration, and the start time or the end time of the second time period is the time designated by the user.

Case 3-3: the electronic equipment supports the user to customize the duration corresponding to the second time period, and supports the user to customize the starting time and the ending time of the second time period. The starting time of the second time period is earlier than the time t1, and the duration corresponding to the second time period may be less than or equal to the first encoding duration.

Case 4:

as shown in fig. 7, in the case where the electronic device supports forward recording and caching of PCM format audio data in the first duration and backward recording and caching of PCM format audio data in the second duration, when the electronic device receives the recording instruction at time t1, the electronic device has recorded and cached forward PCM format audio data corresponding to the first duration, and the electronic device may record and acquire PCM format audio data corresponding to the second duration backward again, so as to obtain complete third audio data corresponding to the first duration.

It can be understood that the electronic device may record and buffer the PCM format audio data corresponding to the second duration after the electronic device receives the first time of the recording instruction, that is, the third audio data is buffered completely before the first time, which is not limited herein.

Case 4 may be further divided into the following cases according to whether the electronic device supports user-defined selection of the starting time and/or the ending time of the second time period:

in case 4-1, the electronic device does not support the user to customize the start time and the end time of the second time period, or the electronic device supports the user to customize the start time and the end time of the second time period but the user does not customize the start time and the end time of the second time period, where the duration corresponding to the second time period is equal to the first encoding duration. As shown in fig. 8, the first encoding duration is equal to the sum of one half of the first duration and one half of the second duration, the start time of the second period is the time corresponding to the difference between the time t1 and one half of the first duration, and the end time of the second period is the time corresponding to the sum of the time t1 and one half of the second duration. The assumption of one-half is merely an example, and other suitable values are also possible, which are not limited herein.

In case 4-2, the electronic device supports the user to customize and select one of the start time and the end time of the second time period, or the electronic device supports the user to customize and select the start time and the end time of the second time period, and the start time and the end time corresponding to the second time period may be before the first time or after the first time.

In the embodiment of the application, the second audio data obtained after the first audio data is encoded is audio data in an advanced audio coding (advanced audio coding, aac) format. That is, the aac format audio data obtained by compressing the first audio data by the audio compression algorithm with high compression ratio is that the aac format audio data has smaller data volume and less data redundancy compared with the first audio data.

In the embodiment of the present application, the following two ways are used to encode the first audio data.

Mode 1: the encoding of the first audio data is achieved through a media codec Application Programming Interface (API) for accessing the multimedia codec of the Android underlying layer. The media codec interface can provide audio/video encoding/decoding functions, and can be configured by parameters to decide which encoding algorithm to use and whether to encode the first audio data by hardware encoding acceleration. It is to be appreciated that the audio processing methods provided herein are equally applicable to IOS platforms, as not limited in this regard.

Mode 2: the first audio data is aac encoded by a command line tool flash picture expert group (fast forward moving picture experts group, ffmpeg) that converts the audio format. Specifically, the corresponding coding parameters are set through the ffmpeg command line tool, and an aac encoder is called to code the first audio data.

It is understood that the technical solutions described in the above modes 1 and 2 and the solutions described in the above cases 1 to 4 may be coupled, which are not limited herein.

In an embodiment of the present application, the storing the second audio data may be: and writing the second audio data into an audio-video synthesizer (audio multiplexer) according to a preset storage path to perform audio-video synthesis. The audio/video synthesizer (media multiplexer) is configured to synthesize audio data and video data, and output a second media stream file in mp4 format to the preset storage path, that is, store the media stream file in mp4 format in a local disk storage space corresponding to the preset storage path in the electronic device.

Generally, when audio and video synthesis is performed, a storage location of the output mp4 media stream data, that is, the preset storage path, is specified. That is, in the scene where the recording instruction indicates recording audio and video, after receiving the recording instruction, the encoded second audio data is necessarily stored in the memory of the electronic device. In some scenarios, after completing the audio and video synthesis, the electronic device may further provide an instruction for confirming whether to store, if the electronic device receives the instruction initiated by the user to confirm storage, the electronic device reads the second media stream file into a preset application program (for example, an album application program), and if the electronic device receives the instruction initiated by the user to cancel storage, the electronic device deletes the stored second media stream file again.

In another possible implementation manner, the second audio data may be stored in the local disk storage space of the electronic device as a separate audio media stream file, which is not limited herein. Or, the electronic device may buffer the second audio data by using a buffer after completing the encoding work to obtain the second audio data. And after receiving a confirmation storage instruction initiated by the user, storing the second audio data into the local disk storage space of the electronic equipment as an independent audio media stream, and deleting the cached second audio data if the instruction initiated by the user to cancel the storage is received. This is not limiting herein.

In the embodiment of the application, in the process of playing the first media stream, the electronic device records and caches the third audio data in the PCM format corresponding to the first time period, and encodes the first audio data corresponding to the second time period in the third audio data after receiving the preset recording instruction at the first time. That is, the electronic device encodes only the second audio data corresponding to the first time, but not the audio data related to the whole process of playing the first media stream, thereby avoiding the electronic device from recording and encoding the audio data in real time while playing the media stream, or reducing the duration of recording and encoding the audio data in real time while playing the media stream, reducing the CPU occupation rate of the electronic device when playing the media stream, reducing the probability of jamming, and improving the user experience.

It can be appreciated that the electronic device can always read and cache the third audio data corresponding to the first time period during the entire playing of the first media stream. For example, after the electronic device receives the recording instruction at the first time and completes the encoding work of the first audio data and the storing work of the second audio data, the electronic device still continues to execute the task of playing the media stream, and continuously updates and stores the third audio data corresponding to the first time period.

It can be appreciated that in other audio processing methods, the electronic device continuously records and encodes audio data during the media stream playing process, and the electronic device needs to encode all PCM format audio data in the entire movie no matter how many times the electronic device receives the recording instruction initiated by the user. However, with the method provided in the embodiment of the present application, during the whole media stream playing process, the electronic device encodes only the second audio data corresponding to the first time, for example, the electronic device receives a recording instruction initiated by the user once, and the electronic device only needs to correspond to at most one recording time period indicated by the user.

The following describes in detail, with reference to specific scenarios, which aspects of the audio processing method provided by the embodiments of the present application may reduce the CPU occupancy rate.

For example, in one scenario, the electronic device may not receive the user-initiated recording finger until the first media stream is played.

If other audio processing methods are adopted, the electronic equipment records and encodes the audio while playing the media stream in the whole process of playing the first media stream, and caches the encoded aac format audio data. However, the electronic device does not receive the recording instruction initiated by the user in the whole media stream playing process, that is, the aac format audio data obtained by encoding is not used at all, and meanwhile, the problems of blocking, low response speed of the whole machine and the like of the electronic device in the media stream playing process are caused by the large CPU occupation rate required by encoding the PCM format audio data.

However, with the audio processing method provided in the embodiment of the present application, the electronic device records and caches PCM format audio data while playing audio, and does not encode PCM format audio data until receiving a recording instruction from a user. If the electronic equipment does not receive the recording instruction initiated by the user in the whole media stream playing process, the coding work of PCM format audio data of the whole film can be omitted, and the performance loss is reduced.

For example, in some scenarios, the electronic device only supports forward encoding back (i.e. only encodes PCM format data stored before the first time to generate encoded data), and the electronic device only receives a recording command at one first time during the entire media stream playing process. Then with the audio data method provided herein, the electronic device does not perform the encoding task of the audio data while playing the media stream until the first time. But it takes a part of time after the first time to encode PCM format data selected by the user.

For example, when a scene of a film is played, the electronic device pauses a film playing thread after entering a recording page at a first moment, and performs coding work on audio data in a historical PCM format, so that the problem of high CPU occupation rate caused by audio coding while playing the film can be completely avoided.

Or, for another example, the electronic device enters the recording page at the first moment, but at the same time, the game operation page is displayed, that is, the game thread is not paused, although the electronic device inevitably needs to play the media stream and encode the historical PCM format audio data corresponding to the recording time period at the same time within a period of time after the first moment, until the encoding is finished at the second moment; but before the first moment and after the second moment, the electronic equipment does not need to encode the PCM format audio data any more, so that the duration of encoding tasks of the audio data can be reduced while the media stream is played, and the CPU occupation rate in the playing process of the media stream is reduced.

For example, in some scenarios, the electronic device supports both forward and backward encoded recording (as well as encoding PCM formatted data for a first time period before a first time period to generate encoded data, and encoding PCM formatted data for a second time period after the first time period to generate encoded data), and during the entire media stream playback process, the electronic device receives a recording instruction only once at the first time period. Also, with the audio processing method provided herein, the electronic device does not perform the encoding task of the audio data while playing the media stream until the first time. But it takes a part of time after the first time to encode PCM format data selected by the user.

For example, after a scene of playing a film is played, the electronic device pauses a film playing thread after entering a recording page at a first time, firstly records PCM format audio data corresponding to a recordable time length supported by the electronic device after the first time to obtain complete selectable PCM format audio data, and then encodes the PCM format audio data according to a recording time period selected by a user, so that the problem of high CPU occupation rate caused by audio encoding while playing the film can be completely avoided.

Or, for another example, the electronic device enters the recording page at the first moment, but at the same time, the game operation page is displayed, that is, the game thread is not paused, although the electronic device inevitably needs to record the PCM format audio data corresponding to the recordable duration after the first moment in real time while playing the media stream, and encode the PCM format audio data in a period of time after the first moment until the encoding of the second moment is finished; but before the first moment and after the second moment, the electronic equipment does not need to encode the PCM format audio data any more, so that the duration of encoding tasks of the audio data can be reduced while the media stream is played, and the CPU occupation rate in the playing process of the media stream is reduced.

In one possible implementation manner, to save the buffer space, the electronic device may delete the buffered third audio data after playing the first media stream data. Alternatively, in other implementations, after the electronic device plays the first media stream data, the buffered third audio data may be retained for a period of time and then deleted according to specific requirements, or the third audio data may be encoded into the aac format audio data and then retained for a period of time and then deleted, which is not limited herein. And whether the electronic device keeps storing the third audio data is irrelevant to whether the electronic device detects the recording instruction in the process of playing the first media stream. For example, in a scenario where the media stream played by the electronic device is a media stream of a movie work, the electronic device may delete the cached third audio data after determining that the first set of movie works has been played. For another example, in a scenario where the media stream played by the electronic device is a game picture, the electronic device may encode all the buffered third audio data into aac format audio data after determining that the current game is finished, and delete the audio data after storing the audio data for a period of time.

It can be understood that, by adopting the method provided by the embodiment of the present application, the audio data content of the third audio data is cached in the buffer, and the life cycle of the buffer is related to the life cycle of the recording service provided by the electronic device. Illustratively, when playing a film, the electronic device maintains a process of a screen recording service in the background, and provides a screen recording function for a user. The screen recording function executes the method provided by the embodiment of the present application, and the corresponding buffer space is used to buffer the third audio data. After the playing of the film is finished, the electronic equipment destroys the progress of the screen recording service, and releases the corresponding buffer space (namely, clears the data buffered in the buffer).

That is, if the electronic device needs to keep the cached third audio data for a period of time according to specific requirements after playing the first media stream data, the third audio data needs to be stored in the electronic device before the recording service process is destroyed, or the third audio data is encoded into aac format audio data and then stored in the electronic device.

In the embodiment of the application, the buffer of the third audio data in the non-encoded PCM format does not have excessive storage pressure for the electronic device, considering that the size of the audio data is small. However, the size of the video data is relatively large, and the buffering of the video data in the non-encoded unprocessed image (raw) format brings a certain buffering pressure to the electronic device, so that the audio processing method provided in the present application is mainly described by taking the processing of the audio data as an example, but the concept of the audio processing method provided in the present application is not indicated to be not applicable to the video processing scene.

The ideas of the audio processing method provided in the present application can also be used in video processing scenarios, for example. For example, in the audio and video recording function provided by the electronic device, the electronic device may buffer the audio data and the video data forward, and when the user triggers the highlight function, the buffered audio data and the buffered video data are used to perform audio and video synthesis, so as to obtain the video. The processing of the video data may specifically include: under the condition that the electronic equipment meets the preset condition, reading and caching first original video data in a raw format corresponding to a first time period; after receiving a recording instruction at a first moment, acquiring second original video data corresponding to a second time period in the first original video data based on the first moment; the second original video data is encoded by an encoding means (for example, an h.264 or h.265 encoding means) to obtain target encoded video data, and the target encoded video data is stored in a media multiplexer for audio-video synthesis.

Example 2:

the audio recording method provided in the above embodiment 1 can be applied to an audio recording (i.e. recording) scene of an electronic device. The audio processing method provided in the embodiment of the present application is described in detail below with reference to fig. 9. For example, in the recording scenario, after receiving the recording instruction, the electronic device records PCM format audio data (i.e. S901), which is equivalent to the electronic device playing the first media stream data in step S101, and caches the first original audio data; the electronic device receives the end instruction for ending the recording (step S902), which corresponds to the electronic device acquiring the recording instruction in S102 in embodiment 1.

Specifically, as shown in fig. 9, the audio processing method includes the following steps:

s901, after receiving a recording instruction to start recording audio, starting recording and buffering PCM format original audio data.

For example, for a recording application program (i.e., application software for recording) of the electronic device, the electronic device receives, at an application interface of the recording application program, a control for starting to record audio by clicking by a user, that is, a recording instruction for starting to record audio is received by the electronic device.

For the PCM format audio data, and how to obtain the PCM format audio data, refer to the foregoing related description (e.g. related description in step S101 in embodiment 1), which will not be described in detail herein.

S902, it is determined whether an end instruction to end recording is received.

It can be understood that the above-mentioned recording ending instruction is a recording instruction in this document, and is used to instruct the electronic device to encode and store the audio data after completing recording, where encoding the audio data can also be understood as making the audio data.

Specifically, after determining that the end instruction is not received, step S903 is performed; after determining that the end instruction is received, step S904 is performed.

S903, recording and buffering the original audio data continuously.

And S904, ending the recording and obtaining third audio data.

In some descriptions herein, the third audio data may also be referred to as third audio data.

In this embodiment of the present application, after receiving a recording instruction for starting recording audio, the electronic device starts recording and buffers PCM format original audio data continuously until receiving an end instruction for ending recording. The starting time of the third audio data is the time when the electronic equipment receives the recording instruction, and the ending time of the third audio data is the time when the electronic equipment receives the ending instruction.

And S905, encoding the third audio data to obtain second audio data, and storing the second audio data.

In the embodiment of the application, when the electronic device executes the recording function, after receiving the recording instruction, the PCM format audio data is continuously recorded and cached until an ending instruction for ending recording is received, and then the cached third audio data is encoded. Instead of recording and encoding, the encoding time length for encoding the PCM format audio data is reduced, the CPU occupation rate during recording the audio data is reduced, the influence of recording the audio data on the response speed of the whole machine is reduced, and the user experience is provided.

It will be appreciated that in some scenarios, the electronic device may also use other functions on the electronic device when performing the sound recording function. When the electronic device uses the recording software, other application software (such as video software and game software) with higher CPU (Central processing Unit) is also used, when the electronic device is used for recording, the audio data in the PCM format is recorded and cached firstly instead of being recorded and coded in real time, and after the recording is finished, the audio data is coded, so that the coding time for coding the audio data in the PCM format is reduced, the CPU occupation rate during the audio data recording is reduced, and the influence on the response speed of the whole computer when the audio data is recorded is reduced.

Reference may be made to the relevant description of other embodiments herein (e.g. the relevant description in step S103 in embodiment 1) for how the third audio data in PCM format is encoded in particular, which will not be described in detail here.

Example 3:

the following describes in detail the audio processing method provided in the embodiment of the present application with reference to fig. 10, and as shown in fig. 10, the audio processing method includes the following steps:

s1001, during the process of playing the first media stream, a recording instruction is received.

In this embodiment of the present application, the recording instruction is used to instruct the electronic device to record audio data corresponding to a first time, where the first time is a playing time of the first media stream when the recording instruction is received.

In some possible implementations, the recording instruction may be a user-initiated recording instruction.

It will be appreciated that in a scenario where the first media stream includes audio data and video data, the recording instructions generally include instructions for a user to record audio, record video, encode the recorded audio and video, and use the encoded audio and video for audio-video synthesis. For example, multiplexing fig. 2, a user may initiate the above-described recording instructions via recording control 202 as shown in fig. 2. In the embodiment of the present application, the recording instruction is actually an encoding instruction initiated by the user to encode the buffered media stream data (e.g., audio data).

In some possible implementations, the recording instruction is triggered by the electronic device according to a target condition, where the target condition is that the current playing content of the electronic device is target media stream data in the first media stream. That is, the recording instruction may be a recording instruction set by the electronic device in a user-defined manner, where the recording instruction is triggered when the electronic device detects that the current playing content is the target media stream data. The target media stream data may also be understood as highlight. For example, for highlight content in a game battle, the electronic device may automatically detect and identify the highlight content, automatically encode part or all of the cached audio data after detecting and identifying the highlight content, and store the encoded data.

For detailed description of the recording instruction, reference is made to the description of step S102 in embodiment 1 and step S902 in embodiment 2, which are not described in detail herein.

S1002, encoding the first audio data based on the recording instruction to obtain second audio data in the format of the advanced audio coding aac, and responding to the recording instruction.

In this embodiment of the present application, the first audio data is audio data in an uncoded pulse modulation recording PCM format corresponding to the first time, which is buffered in the electronic device.

In some possible implementations, the encoding the first audio data based on the recording instruction to obtain the second audio data in the aac format includes: the first audio data is determined from third audio data, which is audio data already buffered in the electronic device.

In some possible implementations, the third audio data is only audio data that has been buffered in the electronic device, the start time and the end time of which may be independent of the first time. The third audio data is, for example, audio data corresponding to a target data content in the first media stream data, where the target data content may also be understood as media stream data corresponding to a highlight moment. And then, after the electronic equipment receives the recording instruction, selecting the corresponding audio data in a period of time before the first time and a period of time after the first time from the cached third audio data as the first audio data.

In some possible implementations, the third audio data may also be audio data corresponding to the first time, for example, the third audio data is all audio data of the first media stream before the first time, or the third audio data is audio data of the first media stream in a first period before the first time, or the third audio data includes audio data of the first media stream in the first period before the first time and audio data of the first media stream in a second period after the first time, which is not limited herein, and the relation between the third audio data and the first time may be specifically referred to in the description related to S101 in embodiment 1 (for example, description in case 1 and case 2 in S101).

In some possible implementations, the end time of the first audio data is a first time, and the duration of the first audio data is a first encoding duration. Reference is made in particular to the relevant description of example 1 herein with respect to fig. 5, which is not described in detail here.

In some descriptions, the first encoding duration is also referred to as a maximum encoding duration, where the maximum encoding duration is a maximum duration of data that the electronic device is allowed to encode. That is, the duration of the first audio data is less than or equal to the maximum encoding duration.

In some possible implementations, the duration of the first audio data is less than or equal to a first encoding duration, and the duration of the third audio data is greater than the first encoding duration and less than or equal to a second encoding duration.

In an exemplary embodiment, in a scenario where the first media stream data includes audio data and video data, the electronic device updates, in real time, the audio data in a first period of time before the current playing time according to the current playing time to serve as the third audio data, and simultaneously, the electronic device updates, in parallel, the video data corresponding to the current playing time according to the current playing time. However, due to the larger data size of the video data, sometimes the video data is buffered too slowly due to insufficient use of the CPU, resulting in a problem that the time lengths of the audio data and the video data are not completely matched. For example, the current playing time is 09:50 (9 minutes 50 seconds), the time period of the third audio data is 09:30 to 09:50, but the time period of the video data is 09:20 to 09:40, that is, the overlapping time period of the third audio data and the video data is only 09:30 to 09:40. In addition, the user is typically presented with a time period of video data that is used at both the start time and the end time of the first audio data. That is, the third audio data cannot provide audio data of 09:20 to 09:30 for audio-video synthesis with video data of 09:20 to 09:30. However, it can be understood that if the period of the first audio data includes a non-overlapping period, where the non-overlapping period is any period in the period (i.e., the period from 09:20 to 09:30) in which the third audio data cannot be overlapped with the video data, the electronic device needs to buffer the period of the audio that cannot be provided in the third audio data through another process, so that the response time of the electronic device to the recording instruction is increased, and bad use experience is brought to the user.

Based on the above-mentioned problem, in the embodiment of the present application, the duration of the third audio data is set to be longer than the first encoding duration, that is, the first encoding duration is shorter than the duration of the buffered video data, so that the probability that the time period of the first audio data includes a non-overlapping time period can be reduced to a certain extent, the response duration of the electronic device to the recording instruction is reduced, and the user experience is further improved.

In addition, in consideration of the cache pressure problem of the electronic device, the duration of the third audio data may be less than or equal to the second encoding duration, for example, the duration of the third audio data may be less than or equal to 20s or 30s or other suitable duration, so as to reduce the cache pressure of the electronic device.

It will be appreciated that, based on the requirement, the duration of the third audio data may be longer than the second encoding duration, which is not limited herein.

In some possible implementations, the start time of the first audio data is earlier than the first time, the end time of the first audio data is later than the first time, and the duration of the first audio data is the first encoding duration.

Illustratively, determining the first audio data from the third audio data includes: taking the second moment as the starting moment of the first audio data, and intercepting the first audio data from the third audio data based on the first encoding time length, wherein the second moment is earlier than the first moment, and the ending moment of the first audio data is later than the first moment; or, taking the third time as the end time of the first audio data, and intercepting the first audio data from the third audio data based on the first encoding time length, wherein the third time is later than the first time, and the start time of the first audio data is earlier than the first time. For example, reference may be made specifically to the related description in step S103 in embodiment 1 herein with respect to fig. 6, and detailed description thereof will not be repeated here.

In some possible implementations, determining the first audio data from the buffered third audio data includes: determining a first starting time and a first ending time according to user input, wherein the first starting time is not earlier than the starting time of the third audio data, the first ending time is not later than the ending time of the third audio data, and the difference value between the first starting time and the first ending time is smaller than or equal to the first coding duration; and determining audio data corresponding to the first starting time and the first ending time in the third audio data as the first audio data. For example, reference is specifically made to the description of case 3-3 and case 4-2 in step S103 in example 1 herein, and detailed description thereof will be omitted.

In some possible implementations, before the receiving the recording instruction, the method further includes: and caching third audio data through a first array, wherein each array element in the first array is used for recording the audio data cached in a buffer area buffer and a timestamp corresponding to the audio data cached in the buffer area buffer. Reference is specifically made to the description of case 1 in step S101 in example 1 herein, and detailed description thereof will be omitted.

In one possible implementation manner, before the receiving the recording instruction, the method further includes: caching audio data corresponding to a third time period in the third audio data, wherein the ending time of the third time period is the first time; after the receiving the recording instruction, before the encoding the first audio data based on the recording instruction, the method further includes: and caching the audio data corresponding to a fourth time period in the third audio data, wherein the starting moment of the fourth time period is the first moment. Reference is specifically made to the description of case 2 in step S101 in example 1 herein, and detailed description thereof will be omitted.

In some possible implementations, the first media stream includes audio data and video data, and after obtaining the second audio data in the advanced audio coding aac format, the method further includes: writing the second audio data into an audio-video synthesizer media multiplexer according to a preset storage path; and synthesizing audio and video based on the media multiplexer and the second audio data to obtain second media stream data corresponding to the first moment, wherein the second media stream data is media stream data in a dynamic image expert group mp4 format. Reference is specifically made to the description of the storage of the second audio data in step S103 in embodiment 1 herein, and detailed description thereof will be omitted.

In one possible implementation, in a scenario where the first media stream includes only audio data (i.e., a recording scenario), the playing of the first media stream in S1001 is that the electronic device is recording audio (recording), and the recording instruction is an instruction to end recording (e.g., an end instruction to end recording in S902). In S1002, the first audio data is encoded based on the recording instruction, and then all the recorded audio data is encoded. In particular, reference may be made to the relevant description of example 2 herein, which is not described in detail herein.

For example, referring to fig. 11, fig. 11 is a schematic structural diagram of an electronic device 100 according to an embodiment of the present application, and a detailed description is given below by using a mobile terminal as an example of the electronic device.

The electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (universal serial bus, USB) interface 130, a charge management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2, a mobile communication module 150, a wireless communication module 160, an audio module 170, a sensor module 180, keys 190, a camera 191, a display 192, and a subscriber identity module (subscriber identification module, SIM) card interface 193, etc. The sensor module 180 may include a touch sensor, a pressure sensor, a gyroscope sensor, a barometric sensor, a magnetic sensor, an acceleration sensor, a distance sensor, a proximity sensor, a fingerprint sensor, a temperature sensor, an ambient light sensor, a bone conduction sensor, etc.

It should be understood that the illustrated structure of the embodiment of the present invention does not constitute a specific limitation on the electronic device 100. In other embodiments of the present application, electronic device 100 may include more or fewer components than shown, or certain components may be combined, or certain components may be split, or different arrangements of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

The processor 110 may include one or more processing units, such as: the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), a controller, a memory, a video codec, a digital signal processor (digital signal processor, DSP), a baseband processor, and/or a neural network processor (neural-network processing unit, NPU), etc. Wherein the different processing units may be separate devices or may be integrated in one or more processors.

The controller may be a neural hub and a command center of the electronic device 100, among others. The controller can generate operation control signals according to the instruction operation codes and the time sequence signals to finish the control of instruction fetching and instruction execution.

A memory may also be provided in the processor 110 for storing instructions and data. In some embodiments, the memory in the processor 110 is a cache memory. The memory may hold instructions or data that the processor 110 has just used or recycled. If the processor 110 needs to reuse the instruction or data, it can be called directly from the memory. Repeated accesses are avoided and the latency of the processor 110 is reduced, thereby improving the efficiency of the system.

In some embodiments, the processor 110 may include one or more interfaces. The interfaces may include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, a universal asynchronous receiver transmitter (universal asynchronous receiver/transmitter, UART) interface, a mobile industry processor interface (mobile industry processor interface, MIPI), a general-purpose input/output (GPIO) interface, a subscriber identity module (subscriber identity module, SIM) interface, and/or a universal serial bus (universal serial bus, USB) interface, among others.

The I2S interface may be used for audio communication. In some embodiments, the processor 110 may contain multiple sets of I2S buses. The processor 110 may be coupled to the audio module 170 via an I2S bus to enable communication between the processor 110 and the audio module 170. In some embodiments, the audio module 170 may transmit an audio signal to the wireless communication module 160 through the I2S interface, to implement a function of answering a call through the bluetooth headset.

PCM interfaces may also be used for audio communication to sample, quantize and encode analog signals. In some embodiments, the audio module 170 and the wireless communication module 160 may be coupled through a PCM bus interface. In some embodiments, the audio module 170 may also transmit audio signals to the wireless communication module 160 through the PCM interface to implement a function of answering a call through the bluetooth headset. Both the I2S interface and the PCM interface may be used for audio communication.

The UART interface is a universal serial data bus for asynchronous communications. The bus may be a bi-directional communication bus. It converts the data to be transmitted between serial communication and parallel communication. In some embodiments, a UART interface is typically used to connect the processor 110 with the wireless communication module 160. For example: the processor 110 communicates with a bluetooth module in the wireless communication module 160 through a UART interface to implement a bluetooth function. In some embodiments, the audio module 170 may transmit an audio signal to the wireless communication module 160 through a UART interface, to implement a function of playing music through a bluetooth headset.

The GPIO interface may be configured by software. The GPIO interface may be configured as a control signal or as a data signal. In some embodiments, a GPIO interface may be used to connect the processor 110 with the camera 193, the display 194, the wireless communication module 160, the audio module 170, the sensor module 180, and the like. The GPIO interface may also be configured as an I2C interface, an I2S interface, a UART interface, an MIPI interface, etc.

It should be understood that the interfacing relationship between the modules illustrated in the embodiments of the present invention is only illustrative, and is not meant to limit the structure of the electronic device 100. In other embodiments of the present application, the electronic device 100 may also use different interfacing manners, or a combination of multiple interfacing manners in the foregoing embodiments.

The wireless communication function of the electronic device 100 may be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, a modem processor, a baseband processor, and the like.

The antennas 1 and 2 are used for transmitting and receiving electromagnetic wave signals. Each antenna in the electronic device 100 may be used to cover a single or multiple communication bands. Different antennas may also be multiplexed to improve the utilization of the antennas. For example: the antenna 1 may be multiplexed into a diversity antenna of a wireless local area network. In other embodiments, the antenna may be used in conjunction with a tuning switch.

The mobile communication module 150 may provide a solution for wireless communication including 2G/3G/4G/5G, etc., applied to the electronic device 100. The mobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (low noise amplifier, LNA), etc. The mobile communication module 150 may receive electromagnetic waves from the antenna 1, perform processes such as filtering, amplifying, and the like on the received electromagnetic waves, and transmit the processed electromagnetic waves to the modem processor for demodulation. The mobile communication module 150 can amplify the signal modulated by the modem processor, and convert the signal into electromagnetic waves through the antenna 1 to radiate. In some embodiments, at least some of the functional modules of the mobile communication module 150 may be disposed in the processor 110. In some embodiments, at least some of the functional modules of the mobile communication module 150 may be provided in the same device as at least some of the modules of the processor 110.

The modem processor may include a modulator and a demodulator. The modulator is used for modulating the low-frequency baseband signal to be transmitted into a medium-high frequency signal. The demodulator is used for demodulating the received electromagnetic wave signal into a low-frequency baseband signal. The demodulator then transmits the demodulated low frequency baseband signal to the baseband processor for processing. The low frequency baseband signal is processed by the baseband processor and then transferred to the application processor. The application processor outputs sound signals through an audio device (not limited to the speaker 170A, the receiver 170B, etc.), or displays images or video through the display 192. In some embodiments, the modem processor may be a stand-alone device. In other embodiments, the modem processor may be provided in the same device as the mobile communication module 150 or other functional module, independent of the processor 110.

The wireless communication module 160 may provide solutions for wireless communication including wireless local area network (wireless local area networks, WLAN) (e.g., wireless fidelity (wireless fidelity, wi-Fi) network), bluetooth (BT), global navigation satellite system (global navigation satellite system, GNSS), frequency modulation (frequency modulation, FM), near field wireless communication technology (near field communication, NFC), infrared technology (IR), etc., as applied to the electronic device 100. The wireless communication module 160 may be one or more devices that integrate at least one communication processing module. The wireless communication module 160 receives electromagnetic waves via the antenna 2, modulates the electromagnetic wave signals, filters the electromagnetic wave signals, and transmits the processed signals to the processor 110. The wireless communication module 160 may also receive a signal to be transmitted from the processor 110, frequency modulate it, amplify it, and convert it to electromagnetic waves for radiation via the antenna 2.

In some embodiments, antenna 1 and mobile communication module 150 of electronic device 100 are coupled, and antenna 2 and wireless communication module 160 are coupled, such that electronic device 100 may communicate with a network and other devices through wireless communication techniques. The wireless communication techniques may include the Global System for Mobile communications (global system for mobile communications, GSM), general packet radio service (general packet radio service, GPRS), code division multiple access (code division multiple access, CDMA), wideband code division multiple access (wideband code division multiple access, WCDMA), time division code division multiple access (time-division code division multiple access, TD-SCDMA), long term evolution (long term evolution, LTE), BT, GNSS, WLAN, NFC, FM, and/or IR techniques, among others. The GNSS may include a global satellite positioning system (global positioning system, GPS), a global navigation satellite system (global navigation satellite system, GLONASS), a beidou satellite navigation system (beidou navigation satellite system, BDS), a quasi zenith satellite system (quasi-zenith satellite system, QZSS) and/or a satellite based augmentation system (satellite based augmentation systems, SBAS).

The electronic device 100 implements display functions through a GPU, a display screen 192, and an application processor, etc. The GPU is a microprocessor for image processing, and is connected to the display 192 and the application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. Processor 110 may include one or more GPUs that execute program instructions to generate or change display information.

The display 192 is used to display images, videos, and the like. The display 192 includes a display panel. The display panel may employ a liquid crystal display (liquid crystal display, LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode (AMOLED) or an active-matrix organic light-emitting diode (matrix organic light emitting diode), a flexible light-emitting diode (flex), a mini, a Micro led, a Micro-OLED, a quantum dot light-emitting diode (quantum dot light emitting diodes, QLED), or the like. In some embodiments, the electronic device 100 may include 1 or N display screens 192, N being a positive integer greater than 1.

The electronic device 100 may implement photographing functions through an ISP, a camera 191, a video codec, a GPU, a display 192, an application processor, and the like.

The ISP is used to process the data fed back by the camera 191. For example, when photographing, the shutter slit is opened, light is transmitted to the camera photosensitive element through the lens, the optical signal is converted into an electric signal, and the camera photosensitive element transmits the electric signal to the ISP for processing and is converted into an image visible to naked eyes. ISP can also optimize the noise, brightness and skin color of the image. The ISP can also optimize parameters such as exposure, color temperature and the like of a shooting scene. In some embodiments, the ISP may be provided in the camera 191.

The camera 191 is used to capture still images or video. The object generates an optical image through the lens and projects the optical image onto the photosensitive element. The photosensitive element may be a charge coupled device (charge coupled device, CCD) or a Complementary Metal Oxide Semiconductor (CMOS) phototransistor. The photosensitive element converts the optical signal into an electrical signal, which is then transferred to the ISP to be converted into a digital image signal. The ISP outputs the digital image signal to the DSP for processing. The DSP converts the digital image signal into an image signal in a standard RGB, YUV, or the like format. In some embodiments, the electronic device 100 may include 1 or N cameras 191, N being a positive integer greater than 1.

The digital signal processor is used for processing digital signals, and can process other digital signals besides digital image signals. For example, when the electronic device 100 selects a frequency bin, the digital signal processor is used to fourier transform the frequency bin energy, or the like.

Video codecs are used to compress or decompress digital video. The electronic device 100 may support one or more video codecs. In this way, the electronic device 100 may play or record video in a variety of encoding formats, such as: dynamic picture experts group (moving picture experts group, MPEG) 1, MPEG2, MPEG3, MPEG4, etc.

The NPU is a neural-network (NN) computing processor, and can rapidly process input information by referencing a biological neural network structure, for example, referencing a transmission mode between human brain neurons, and can also continuously perform self-learning. Applications such as intelligent awareness of the electronic device 100 may be implemented through the NPU, for example: image recognition, face recognition, speech recognition, text understanding, etc. The decision model provided by the embodiment of the application can also be realized through the NPU.

The external memory interface 120 may be used to connect an external memory card, such as a Micro SD card, to enable expansion of the memory capabilities of the electronic device 100. The external memory card communicates with the processor 110 through an external memory interface 120 to implement data storage functions. For example, files such as music, video, etc. are stored in an external memory card.

The internal memory 121 may be used to store computer executable program code including instructions. The processor 110 executes various functional applications of the electronic device 100 and data processing by executing instructions stored in the internal memory 121. The internal memory 121 may include a storage program area and a storage data area. The storage program area may store an application program (such as a sound playing function, an image playing function, etc.) required for at least one function of the operating system, etc. The storage data area may store data created during use of the electronic device 100 (e.g., audio data, phonebook, etc.), and so on. In addition, the internal memory 121 may include a high-speed random access memory, and may further include a nonvolatile memory such as at least one magnetic disk storage device, a flash memory device, a universal flash memory (universal flash storage, UFS), and the like.

The electronic device 100 may implement audio functions through an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, an application processor, and the like. Such as music play, audio video play, sound recording, etc.

The audio module 170 is used to convert digital audio information into an analog audio signal output and also to convert an analog audio input into a digital audio signal. The audio module 170 may also be used to encode and decode audio signals. In some embodiments, the audio module 170 may be disposed in the processor 110, or a portion of the functional modules of the audio module 170 may be disposed in the processor 110.

The electronic device may play the first media stream data through the processor 110, the display 192, and the audio module 170, and buffer the PCM format audio data (third audio data) in the buffer in the internal memory 121 when playing the first media stream. Whether a recording instruction is received is detected by a touch sensor in the sensor module 180, and after the recording instruction is received, the buffered first audio data is encoded by the processor 110 and the audio module 170. The description of terms such as first media stream, first audio data, recording instructions, etc. may refer to the method embodiments herein and will not be described in detail herein.

It will be appreciated that the processor 110 may be configured to perform the method or step of any of the method embodiments described in embodiments 1 to 3, or may be configured to cooperate with other modules in the electronic device 100 to perform the method or step of any of the method embodiments described in embodiments 1 to 3, which is not limited herein.

The layered architecture divides the software into several layers, each with distinct roles and branches. The layers communicate with each other through a software interface. In some embodiments, the system is divided into four layers, from top to bottom, an application layer, an application framework layer, runtime (run time) and system libraries, and a kernel layer, respectively.

The application layer may include a series of application packages.

As shown in fig. 12, the application package may include applications (also referred to as applications) of cameras, gallery, calendar, phone calls, maps, navigation, WLAN, bluetooth, music, video, short messages, etc.

In the embodiment of the present application, the application layer may further include an audio processing module, where the audio processing module is configured to perform the audio processing method in the embodiment of the present application.

In some embodiments of the present application, the audio processing module may also be located in other levels of the software architecture, such as an application framework layer, a system library, a kernel layer, etc., without limitation.

The application framework layer provides an application programming interface (application programming interface, API) and programming framework for application programs of the application layer. The application framework layer includes a number of predefined functions.

As shown in fig. 12, the application framework layer may include a window manager, a content provider, a view system, a phone manager, a resource manager, a notification manager, and the like.

The window manager is used for managing window programs. The content provider is used to store and retrieve data and make such data accessible to applications. The view system includes visual controls, such as controls to display text, controls to display images, and the like. The view system may be used to build applications. The telephony manager is used to provide the communication functions of the electronic device 100. The resource manager provides various resources to the application program, such as localization strings, icons, images, layout files, video files, and the like. The notification manager allows the application to display notification information in a status bar, can be used to communicate notification type messages, can automatically disappear after a short dwell, and does not require user interaction.

The Runtime (run time) includes core libraries and virtual machines. Run time is responsible for scheduling and management of the system.

The core library consists of two parts: one part is the function that the programming language (e.g., the java language) needs to call, and the other part is the core library of the system.

The application layer and the application framework layer may run in a virtual machine. The virtual machine may execute programming files (e.g., java files) of the application layer and the application framework layer as binary files. The virtual machine is used for executing the functions of object life cycle management, stack management, thread management, security and exception management, garbage collection and the like.

The system library may include a plurality of functional modules. For example: surface manager (surface manager), media Libraries (Media Libraries), three-dimensional graphics processing Libraries (e.g., openGL ES), two-dimensional graphics engines (e.g., SGL), etc.

The kernel layer is a layer between hardware and software. The kernel layer may contain display drivers, camera drivers, audio drivers, sensor drivers, virtual card drivers, etc.

As used in the above embodiments, the term "when …" may be interpreted to mean "if …" or "after …" or "in response to determination …" or "in response to detection …" depending on the context. Similarly, the phrase "at the time of determination …" or "if detected (a stated condition or event)" may be interpreted to mean "if determined …" or "in response to determination …" or "at the time of detection (a stated condition or event)" or "in response to detection (a stated condition or event)" depending on the context.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present application, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital subscriber line), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid state disk), etc.

Those of ordinary skill in the art will appreciate that implementing all or part of the above-described method embodiments may be accomplished by a computer program to instruct related hardware, the program may be stored in a computer readable storage medium, and the program may include the above-described method embodiments when executed. And the aforementioned storage medium includes: ROM or random access memory RAM, magnetic or optical disk, etc.

The above embodiments are merely for illustrating the technical solution of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the corresponding technical solutions from the scope of the technical solutions of the embodiments of the present application.

Claims

1. An audio processing method, applied to an electronic device, where the electronic device supports playing a first media stream, the method comprising:

receiving a recording instruction in the process of playing the first media stream, wherein the recording instruction is used for indicating the electronic equipment to record audio data corresponding to a first moment, and the first moment is the playing moment of the first media stream when the recording instruction is received;

And encoding the first audio data based on the recording instruction to obtain second audio data in an advanced audio coding aac format, wherein the first audio data is the audio data in an uncoded pulse modulation recording PCM format which is cached in the electronic equipment and corresponds to the first time.

2. The method of claim 1, wherein encoding the first audio data based on the recording instruction results in second audio data in an advanced audio coding aac format, comprising:

determining the first audio data from third audio data, wherein the third audio data is the audio data cached in the electronic equipment;

and encoding the first audio data based on the recording instruction to obtain the second audio data.

3. The method of claim 2, wherein the first audio data has a duration less than or equal to a first encoding duration and the third audio data has a duration greater than the first encoding duration and less than or equal to a second encoding duration.

4. A method according to any one of claims 1 to 3, wherein the end time of the first audio data is the first time and the duration of the first audio data is a first encoding duration.

5. A method according to any one of claims 1 to 3, wherein a start time of the first audio data is earlier than the first time, an end time of the first audio data is later than the first time, and a duration of the first audio data is a first encoding duration.

6. A method according to any one of claims 2 to 3, wherein said determining said first audio data from third audio data comprises:

determining a first starting time and a first ending time according to user input, wherein the difference value between the first starting time and the first ending time is smaller than or equal to a first coding duration;

and determining audio data corresponding to the first starting time and the first ending time in the third audio data as the first audio data.

7. The method of claim 4, wherein prior to said receiving a recording instruction, said method further comprises:

and caching the third audio data through a first array, wherein each array element in the first array is used for recording the audio data cached in a cache area and the corresponding time stamp of the audio data cached in the cache area.

8. The method of any one of claims 1 to 7, wherein the recording instruction is a user initiated recording instruction.

9. The method according to any one of claims 1 to 7, wherein the recording instruction is triggered by the electronic device according to a target condition, where the target condition is that the current playing content of the electronic device is target media stream data in the first media stream.

10. The method according to any one of claims 1 to 9, wherein the first media stream comprises audio data and video data, and wherein after the second audio data in the advanced audio coding aac format is obtained, the method further comprises:

writing the second audio data into an audio-video synthesizer according to a preset storage path;

and performing audio and video synthesis based on the audio and video synthesizer and the second audio data to obtain second media stream data corresponding to the first time, wherein the second media stream data is mp4 format media stream data.

11. An electronic device, the electronic device comprising: one or more processors, memory, and a display screen;

the memory is coupled with the one or more processors, the memory for storing computer program code comprising computer instructions that the one or more processors invoke to cause the electronic device to perform the method of any of claims 1-10.

12. A chip system for application to an electronic device, the chip system comprising one or more processors for invoking computer instructions to cause the electronic device to perform the method of any of claims 1 to 10.

13. A computer readable storage medium comprising instructions that, when run on an electronic device, cause the electronic device to perform the method of any one of claims 1 to 10.